CHAPTERS 


Derivatives and 
Conditional Probability 


SECTION 31. DERIVATIVES ON THE LINE* 


This section on Lebesgue's theory of derivatives for real functions of a real 
variable serves to introduce the general theory of Radon- Nikodym deriva- 
tives, which underlies the modern theory of conditional probability. The 
results here are interesting in themselves and will be referred to later for 
purposes of illustration and comparison, but they will not be required in 
subsequent proofs. 


The Fundamental Theorem of Calculus 


To what extent are the operations of integration and differentiation inverse 
to one another? A function F is by definition an indefinite integral of another 
function f on [a,b] if 


(31.1) F(x) - F(a) - f f(t) at 
for a €x € b; F is by definition a primitive of f if it has derivative f: 
(31.2) F'(x)=f(x) 


for a < x < b. According to the fundamental theorem of calculus (see (17.5), 
these concepts coincide in the case of continuous f: 


Theorem 31.1. Suppose that f is continuous on [a, b]. 


(i) An indefinite integral of f is a primitive of f: if (31.1) holds for all x in 
[a, b], then so does (31.2). 


(ii) A primitive of f is an indefinite integral of f: if (31.2) holds for all x in 
[a, b], then so does (31.1). 


“This section may be omitted. 


400 


SECTION 31. DERIVATIVES ON THE LINE 401 


A basic problem is to investigate the extent to which this theorem holds if 
f is not assumed continuous. First consider part (i). Suppose f is integrable, 
so that the right side of (31.1) makes sense. If f is 0 for x <m and 1 for 
x>m (a <m <b), then an F satisfying (31.1) has no derivative at m. It is 
thus too much to ask that (31.2) hold for all x. On the other hand, according 
to a famous theorem of Lebesgue, if (31.1) holds for all x, then (31.2) holds 
almost everywhere—that is, except for x in a set of Lebesgue measure 0. In 
this section almost everywhere will refer to Lebesgue measure only. This 
result, the most one could hope for, will be proved below (Theorem 31.3). 

Now consider part (ii) of Theorem 31.1. Suppose that (31.2) holds almost 
everywhere, as in Lebesgue’s theorem, just stated. Does (31.1) follow? The 
answer is no: If f is identically 0, and if F(x) is 0 for x <m and 1 for x > m 
(a « m <b), then (31.2) holds almost everywhere, but (31.1) fails for x > m. 
The question was wrongly posed, and the trouble is not far to seek: If f is 
integrable and (31.1) holds, then 


(31.3) F(x +h) - F(x) = [Iu ca tft) dt > 0 


as h 10 by the dominated convergence theorem. Together with a similar 
argument for h 10 this shows that F must be continuous. Hence the question 
becomes this: If F is continuous and f is integrable, and if (31.2) holds 
almost everywhere, does (31.1) follow? The answer, strangely enough, is still 
no: In Example 31.1 there is constructed a continuous, strictly increasing F 
for which F'(x) = 0 except on a set of Lebesgue measure 0, and (31.1) is of 
course impossible if f vanishes almost everywhere and F is strictly increas- 
ing. This leads to the problem of characterizing those F for which (31.1) does 
follow if (31.2) holds outside a set of Lebesgue measure 0 and f is integrable. 
In other words, which functions are the integrals of their (almost everywhere) 
derivatives? Theorem 31.8 gives the characterization. 


It is possible to extend part (ii) of Theorem 31.1 in a different direction. Suppose 
that (31.2) holds for every x, not just almost everywhere. In Example 17.4 there was 
given a function F, everywhere differentiable, whose derivative f is not integrable, 
and in this case the right side of (31.1) has no meaning. If, however, (31.2) holds for 
every x, and if f is integrable, then (31.1) does hold for all x. For most purposes of 
probability theory, it is natural to impose conditions only almost everywhere, and so 
this theorem will not be proved here. 


The program then is first to show that (31.1) for integrable f implies that 
(31.2) holds almost everywhere, and second to characterize those F for which 
the reverse implication is valid. For the most part, f will be nonnegative and 
F will be nondecreasing. This is the case of greatest interest for probability 
theory; F can be regarded as a distribution function and f as a density. 


"For a proof, see RupiN;, p.179. 


402 DERIVATIVES AND CONDITIONAL PROBABILITY 


In Chapters 4 and 5 many distribution functions F were either shown to 
have a density f with respect to Lebesgue measure or were assumed to have 
one, but such F's were never intrinsically characterized, as they will be in this 
section. 


Derivatives of Integrals 


The first step is to show that a nondecreasing function has a derivative almost 
everywhere. This requires two preliminary results. Let A denote Lebesgue 
measure. 


Lemma 1. Let A be a bounded linear Borel set, and let Y be a collection 
of open intervals covering A. Then .£ contains a finite, disjoint subcollection 


Proor. By regularity (Theorem 12.3) A contains a compact subset K 
satisfying A(K) > A(A)/2. Choose in A a finite subcollection .Z covering 
K. Let T, be an interval in .% of maximal length; discard from .⁄ the 
interval T, and all the others that intersect /,. Among the intervals remaining 
in %, let J, be one of maximal length; discard J, and all intervals that 
intersect it. Continue this way until .Z is exhausted. The 7; are disjoint. Let 
J, be the interval with the same midpoint as I, and three times the length. If 
I is an interval in .% that is cast out because it meets J;, then J CJ;. Thus 
each discarded interval is contained in one of the J;, and so the J; cover K. 


Hence LAC) = FÀ(J,)/3 = ACK )/3 = A(A)/6. B8 
If 
(31.4) A: a=a)<a,< ++: <a,=b 


is a partition of an interval [a, b] and F is a function over [a, b], let 


k 
(31.5) IFlla = > |F(a,) - F(a;..)]. 


i=l 


Lemma 2. Consider a partition (31.4) and a nonnegative 0. If 


(31.6) F(a) < F(b), 
and if 
(31.7) eg) < -0 


for a set of intervals [a;..,, a;] of total length d, then 


|F lls >| F(b) — F(a)| + 20d. 


SECTION 31. DERIVATIVES ON THE LINE 403 


This also holds if the inequalities in (31.6) and (31.7) are reversed and —0 is 
replaced by @ in the latter. 


Proor. The figure shows the case where k = 2 and the left-hand interval 
satisfies (31.7). Here F falls at least 0d over [a, a + d], rises the same amount 
over [a + d,u], and then rises F(b) — F(a) over [u, b]. 


For the general case, let ??' denote summation over those i satisfying (31.7) 
and let >” denote summation over the remaining i (1 < i < k). Then 


lFla= 22 (F(6,..) —E(2:)) 05 aka) eva | 
= D. (F(a,.,) = Fi(a,)) SEDI US = Eam) 
E AENG) AC +|(F(b) aB Ça) a DCG) —F(agy)l 


As all the differences in this last expression are nonnegative, the absolute- 
value bars can be suppressed; therefore, 


Fila = F(b) — F(a) * 22 (EC — F(a;)) 
> F(b) — F(a), + 20). (a;—4,-,). m 


A function F has at each x four derivates, the upper and lower right 
derivatives 


D” (x) = limsup F(x +h) — F(x) 
h 10 h 


D,(x) = lim inf AE Ac 


and the upper and lower left derivatives 
F ems T 
"D(x) = lim sup T 
h 10 


i na a F (u ym ROxm b) 
pD(x) = imi C2 EA 


"wt 


104 DERIVATIVES AND CONDITIONAL PROBABILITY 


There is a derivative at x if and only if these four quantities have a common 
value. Suppose that F has finite derivative F'(x) at x. If u < x < p, then 


F(v) — F(u om ki F (D) i CS iyo 
A0) PG) Staal s t 
y| FCs) - FG) _ pc 
b—u x =u j 
Therefore, 
(31.8) Sy) Seer) 


as u T x and v | x; that is to say, for each e there is a ó such that u <x <v 
and 0<uv—u <6 together imply that the quantities on either side of the 
arrow differ by less than e. 

Suppose that F is measurable and that it is continuous except possibly at 
countably many points. This will be true if F is nondecreasing or is the 
difference of two nondecreasing functions. Let M be a countable, dense set 
containing all the discontinuity points of F; let r,(x) be the smallest number 
of the form k/n exceeding x. Then 


FCy)us ee) 


F As ` 
D”(x)= lim sup IE 3 


LVS else) 
yeM 


the function inside the limit is measurable because the x-set where it exceeds 
æ IS 


U [xix &y <r,(x), F(y) — F(x) > aly -x)]. 


Thus D”(x) is measurable, as are the other three derivates. This does not 
exclude infinite values. The set where the four derivates have a common 
finite value F' is therefore a Borel set. In the following theorem, set F’ = 0 
(say) outside this set; F' is then a Borel function. 


Theorem 31.2. A nondecreasing function F is differentiable almost every- 
where, the derivative F' is nonnegative, and 


(31.9) f FG) dt < F(b) — F(a) 
for all a and b. 


This and the following theorems can also be formulated for functions over 
an interval. 


SECTION 31. DERIVATIVES ON THE LINE 405 - 


PRoor. If it can be shown that 
(31.10) D" (z*),S pDl) 


except on a set of Lebesgue measure 0, then by the same result applied to 
G(x) = —F(—x) it will follow that "D(x) = DS(—x) < ¿D(—x) = D,(x) al- 
most everywhere. This will imply that D,(x) < D'(x) < p D(x) < *D(x) < 
D,(x) almost everywhere, since the first and third of these inequalities are 
obvious, and so, outside a set of Lebesgue measure 0, F will have a 
derivative, possibly infinite. Since F is nondecreasing, F' must be nonnega- 
tive, and once (31.9) is proved, it will follow that F' is finite almost every- 
where. 

If (31.10) is violated for a particular x, then for some pair a, B of rationals 
satisfying a < B, x will lie in the set Aag = [x: pD(x) «a « B « D'(x)]. 
Since there are only countably many of these sets, (31.10) will hold outside a 
set of Lebesgue measure 0 if A(A,,) = 0 for all a and £. 

Put G(x) = F(x) - 5(a + B)x and 0 = (B — a). Since differentiation is 
linear, 4,5 = B, = [x: gD(x) < —0 < 0 < D*(x)]. Since F and G have only 
countably many discontinuities, it suffices to prove that A(C,) = 0, where C, 
is the set of points in B, that are continuity points of G. Consider an interval 
(a, b), and suppose for the moment that G(a) x G(b). For each x in C, 
satisfying a < x < b, from cD(x) < —8 it follows that there exists an open 


interval (a,, b,) for which x € (a,, b,) C (a, b) and 
G(b,) — G(a 
(31.11) SU CU «quu: 


There exists by Lemma 1 a finite, disjoint collection (a, b.) of these 
intervals of total length L(b, — a,.) 2 A((a, b) N C,)/6. Let À be the parti- 
tion (31.4) of [a, b] with the points a, and b, in the role of the a,,...,a, ,. 


By Lemma 2, 
(31.12) IIGlla = |G(b) — G(a)| + $6A((a, b) n C,). 


If instead of G(a) « G(b) the reverse inequality holds, choose a, and b, so 
that the ratio in (31.11) exceeds 0, which is possible because D@(x) > @ for 
x € C,. Again (31.12) follows. 

In each interval [a,b] there is thus a partition (31.4) satisfying (31.12). 
Apply this to each interval [a;. ,, a;] in the partition. This gives a partition A, 
that refines A, and adding the corresponding inequalities (31.12) leads to 


Gila, > lIGlla + 30A((a,b) n Co). 


Continuing leads to a sequence of successively finer partitions A, such th: 


0 
(31.13) IGlla, 2 n 3A((a, b) n C,). 


406 DERIVATIVES AND CONDITIONAL PROBABILITY 


Now ||G||, is bounded by |F(b) — F(a)| + 3la + BKb — a) because F is mono- 
tonic. Thus (31.13) is impossible unless A((a, b) N C,) = 0. Since (a, b) can be 
any interval, A(C,) = 0. This proves (31.10) and establishes the differentiabil- 
ity of F almost everywhere. 

It remains to prove (31.9). Let 


, Foren ly s For) 


(31.14) Jax) 


Now f, is nonnegative, and by what has been shown, f,(x) > F(x) except 
on a set of Lebesgue measure 0. By Fatou’s lemma and the fact that F is 
nondecreasing, 


J FG) dx < tim int f^f,(x) dx 
— lim int [n [^*^ F(x) dx -n f°" F(x) dx! 
n b a 
< lim inf | F(b - n^!) - F(a)] = F(b +) -— F(a). 


Replacing b by b — e and letting e — 0 gives (31.9). ü 


Theorem 31.3.  /f f is nonnegative and integrable, and if F(x) = f* _f(t) dt, 
then F'(x) = f(x) except on a set of Lebesgue measure 0. 


Since f is nonnegative, F is nondecreasing and hence by Theorem 31.2 is 
differentiable almost everywhere. The problem is to show that the derivative 
F' coincides with f almost everywhere. 


PROOF FOR BOUNDED f. Suppose first that f is bounded by M. Define f, 
by (31.14). Then f,(x)—n[7*" f(t)dt is bounded by M and converges 
almost everywhere to F'(x), so that the bounded convergence theorem gives 


f F(x) de = lim ff, (x) dx 
Ü im [n ^*^ PCa) de -n[**" FG) a]. | T 


Since F is continuous (see (31.3)), this last limit is F(b) — F(a) = J: f(x) dx. 

Thus f4F'(x) dx = f4 f(x) dx for bounded intervals A = (a, b]. Since these 
form a 7-system, it follows (Theorem 16.10(iii)) that F’ = f almost every- 
where. 


SECTION 31. DERIVATIVES ON THE LINE 407 


PROOF FOR INTEGRABLE f. Apply the result for bounded functions to f 
truncated at n: If h,(x) is f(x) or n as f(x) <n or f(x) » n, then H,(x)= 
{*..h,(t) dt differentiates almost everywhere to h,(x) by the case already 
treated. Now F(x) = H,(x) + f[* (f(t) — h, (0) dt; the integral here is nonde- 
creasing because the integrand is nonnegative, and it follows by Theorem 
31.2 that it has almost everywhere a nonnegative derivative. Since differenti- 
ation is linear, F'(x) > H,(x) = h, (x) almost everywhere. As n was arbitrary, 
F'(x) > f(x) almost everywhere, and so /PF'(x) dx > f[Pf(x) dx = F(b) — F(a). 
But the reverse inequality is a consequence of (31.9). Therefore, fP( F(x) — 
f(x)) dx = 0, and as before F' = f except on a set of Lebesgue measure 0. m 


Singular Functions 


If f(x) is nonnegative and integrable, differentiating its indefinite integral 
[> f(t) dt leads back to f(x) except perhaps on a set of Lebesgue measure 
0. That is the content of Theorem 31.3. The converse question is this: If F(x) 
is nondecreasing and hence has almost everywhere a derivative F'(x), does 
integrating F'(x) lead back to F(x)? As stated before, the answer turns out 
to be no even if F(x) is assumed continuous: 


Example 31.1. Let X, ,X,,... be independent, identically distributed 
random variables such that P[ X, — 0] 2 py and P[ X, = 1] =p, — 1 — po, and 


let X = 17 X,2 ". Let F(x) = PL X x x] be the distribution function of X. 
For an ain sequence u,,u,,... of 0’s and Ps, P[X, =u,, n=1,2,...]= 
lim, Pu, —(; since x can have at most two dyadic expansions 


mx w 2^ cup x]= 0. Thus F is everywhere continuous. Of course, 
F(0)=0 and F(1)—1. For 0<k <2", k2^" has the form Y" ,u;2^' for 


some n-tuple (u,,...,u,,) of 0’s and 1’s. Since F is continuous, 
Jesi k k k+1 
(31.15) F( Ez - r( 5) =P| <X < | 


-P|X ui =n) Ds 


This shows that F is strictly increasing over the unit interval. 
If py =p, = 5, the right side of (31.15) is 2^", and a passage to the limit 
shows that F(x) =x for 0 < x < 1. Assume, however, that py * p,. It will be 
shown that F'(x)-— 0 except on a set of Lebesgue measure 0 in this case. 
Obviously the derivative is 0 outside the unit interval, and by Theorem 31.2 it 
exists almost everywhere inside it. Suppose then that 0 <x < 1 and that F 
has a derivative F'(x) at x. It will be shown that F'(x) = 0. E 
For each n choose k, so that x lies in the interval 7, —(k,2-" 
(k, + 12 "]; I, is that dyadic interval of rank n that contains x. By (31. 


P| XET F((k gt 127 Fk. 288 
[xe1) _ F((k v 27") — F(k27*) cg 


408 DERIVATIVES AND CONDITIONAL PROBABILITY 


29 


15 


Graph of F(x) for po = .25, p, = .75. Because of the recursion (31.17), the part of the graph 
over [0,.5] and the part over [.5,1] are identical, apart from changes in scale, with the whole 
graph. Each segment of the curve therefore contains scaled copies of the whole; the extreme 
irregularity this implies is obscured by the fact that the accuracy is only to within the width of the 
printed line. 


If F'(x) is distinct from 0, the ratio of two successive terms here must go to 1, 
so that 


(31.16) c d F 


If I, consists of the reals with nonterminating basen? expansions beginning 
with the digits u,,...,u,, then P[X€I, ] = Py," ` Pu, by (31.15). But 1,44 
must for some u,,, | consist of the reals ata i Woo MA, gap (5 Ë WE 
or 0 according as x lies to the right of the midpoint of 7, or not). Thus 
P[X €1,,,,)/PLX €1,]=p,,,, is either po or pi, and (31.16) is possible only 
if po =P, Which was excluded by hypothesis. 

Thus F is continuous and strictly increasing over [0,1], but F(x) = 0 


except on a set of Lebesgue measure 0. For 0 <x < š independen 


SECTION 31. DERIVATIVES ON THE LINE 409 


F(x) =P[X,=0, Y2.,X,2"*! «2x] 2 p9FQx). Similarly, F(x)— po 
p,FQx —1) for į <x < 1. Thus 


po F(2x) if0<x <3, 


31.17 F = 
( ) (x) po*p,F(2x-1) ifli<x<1. 


In Section 7, F(x) (there denoted Q(x)) entered as the probability of success 
at bold play; see (7.30) and (7.33). p 


A function is singular if it has derivative 0 except on a set of Lebesgue 
measure 0. Of course, a step function constant over intervals is singular. 
What is remarkable (indeed, singular) about the function in the preceding 
example is that it is continuous and strictly increasing but nonetheless has 
derivative 0 except on a set of Lebesgue measure 0. Note that there is strict 
inequality in (31.9) for this F. 

Further properties of nondecreasing functions can be discovered through 
a study of the measures they generate. Assume from now on that F is 
nondecreasing, that F is continuous from the right (this is only a normaliza- 
tion), and that 0— lim, , 4, F(x) x lim, „p-o F(x) =m < o. Call such an F 
a distribution function, even though m need not be 1. By Theorem 12.4 there 
exists a unique measure u on the Borel sets of the line for which 


(31.18) u(a,b] = F(b) — F(a). 


Of course, u(R!) = m is finite. 
The larger F' is, the larger p is: 


Theorem 31.4. Suppose that F and y are related by (31.18) and that F'(x) 
exists throughout a Borel set A. 


(i) If F(x) xa for x € A, then p(A) x aACA). 
(ii) If F(x) x a for x €A, then uA) > aACA). 


Proor. It is no restriction to assume A bounded. Fix e for the moment. 
Let E be a countable, dense set, and let A, = (16A n 1), where the intersec- 
tion extends over the intervals / = (u,v] for which u, v EE, 0 <À(1) «n^! 
and 


(31.19) n (1) € (a * e)A(1). 


Then A, is a Borel set and (see (31.8)) A, 1 A under the hypothesis of (D. 
By Theorem 11.4 there exist disjoint intervals J,, (open on the left, closed on 


410 DERIVATIVES AND CONDITIONAL PROBABILITY 
the right) such that A, C U,/,, and 


(31.20) YA(L4) <ACA,) * €. 
k 


It is no restriction to assume that each 7/,, has endpoints in E, meets 4, , 
and satisfies A(7,,) «n `!. Then (31.19) applies to each 7,,, and hence 


u(A,) < Lax) < (a * €) La) <(a+e)(A(A,) + e). 
k 


In the extreme terms here let n > œ and then e > 0; (i) follows. 

To prove (ii), let the countable, dense set E contain all the discontinuity 
points of F, and use the same argument with w(/) > (a — e)A(1) in place of 
(31.19) and X,u(1,,) <u(A,) + in place of (31.20). Since E contains all 
the discontinuity points of F, it is again no restriction to assume that each T,, 
has endpoints in E, meets A, , and satisfies A(/,,) < n !. It follows that 


wAn) te» Lwin) = (a-e) DACs) = (a ea 
k k 


Again let n — o and then e > 0. = 


The measures u and A have disjoint supports if there exist Borel sets S, 
and $, such that 


Uses S )=0, N S Ss eo 

(31.21) ( 2) ( a) 
SAS =0. 

Theorem 31.5. Suppose that F and y. are related by (31.18). A necessary 


and sufficient condition for u and À to have disjoint supports is that F(x) = 0 
except on a set of Lebesgue measure Q. 


PRoor. By Theorem 31.4, u[x: |x| xa, F(x) x e] x 2ae, and so (let 
e —^ 0 and then a >œ) u[x: F(x)20]20. If F'(x)=0 outside a set of 
Lebesgue measure 0, then S, =[x: F'(x) = 0] and S, = R! — S, satisfy (31.21). 
Suppose that there exist S, and S, satisfying (31.21). By the other half of 
Theorem 31.4, eA[x: F(x) 2 e]  eA[x: x€ S,, F(x) > e] < ACS,) = 0, and 
so (let e — 0) F'(x) = 0 except on a set of Lebesgue measure 0. E 


Example 31.2. Suppose that u is discrete, consisting of a mass m, at each 
of countably many points x,. Then F(x) = Xm,, the sum extending over the 
k for which x, < x. Certainly, 4, and A have disjoint supports, and so F’ must 
vanish except on a set of Lebesgue measure 0. This is directly obvious if the 
x, have no limit points, but not, for example, if they are dense. [ | 


SECTION 31. DERIVATIVES ON THE LINE 411 


Example 31.3. Consider again the distribution function F in Example 
31.1. Here u(A) - PLX&€ A]. Since F is singular, u and A have disjoint 
supports. This fact has an interesting direct probabilistic proof. 

For x in the unit interval, let d(x), d (x),... be the digits in its nontermi- 
nating dyadic expansion, as in Section 1. If (k2 "(k + 1)2 "] is the dyadic 
interval of rank n consisting of the reals whose expansions begin with the 
digits u,,..., ttp then, by (31.15), 


K K) ! 
(31.22) u| sr ) | = nid) mus Een] pn t p 


n 


If the unit interval is regarded as a probability space under the measure pm, 
then the d, (x) become random variables, and (31.22) says that these random 
variables are independent and identically distributed and u[x: d (x) = 0] = ps, 
ulx: d(x) = 1] =p). 

Since these random variables have expected value p,, the strong law of 
large numbers implies that their averages go to p, with probability 1: 


(31.23) u |x €(0, 1]: lim — Y d(x) =p,|=1. 
n i=il 

On the other hand, by the normal number theorem, 
nme le x 1 

(31.24) A} x €(0,1]: lim — 2 d xy =: g; 
n i=l 


(Of course, (31.24) is just (31.23) for the special case p, = p, = 4; in this case 
» and A coincide in the unit interval.) If p, #4, the sets in (31.23) and 
(31.24) are disjoint, so that u and À do have disjoint supports. 

It was shown in Example 31.1 that if F'(x) exists at all (0 <x < 1), then it 
is 0. By part (i) of Theorem 31.4 the set where F'(x) fails to exist therefore 
has -measure 1; in particular, this set is uncountable. Ei 


In the singular case, according to Theorem 31.5, F' vanishes on a support 
of À. It is natural to ask for the size of F' on a support of uw. If B is the x-set 
where F has a finite derivative, and if (31.21) holds, then by Theorem 31.4, 
uMx€B: F(x) n]- u[x € Br Su: F(x) n] nA($,)) - 0, and hence 
BB) = 0. The next theorem goes further. 


Theorem 31.6. Suppose that F and y are related by (31.18) an 


d that 
and À have disjoint supports. Then, except for x in a set Of u-measure 0 
D(x) = =. paf 
F 


ne 


412 DERIVATIVES AND CONDITIONAL PROBABILITY 


If u has finite support, then clearly D(x) = o if u(x) > 0, while D(x) = 0 
for all x. Since F is continuous from the right, ;D and Dp play different 
roles. 


Proor. Let A, be the set where ,;D(x) < n. The problem is to prove 
that uC4,) = 0, and by (31.21) it is enough to prove that ACA, N S,) = 0. 
Further, by Theorem 12.3 it is enough to prove that u(K)=0 if K is a 
compact subset of A, N S,- 

Fix e. Since A(K) = 0, there is an open G such that K C G and A(G) < e. 
If x € K, then x € A,, and by the definition of D and the right-continuity of 
F, there is an open interval J, for which x € I, C G and u(1,) < nA(1,). By 
compactness, K has a finite subcover 1, ,..., /,,. If some three of these have 
a nonempty intersection, one of them must be contained in the union of the 
other two. Such superfluous intervals can be removed from the subcover, and 
it is therefore possible to assume that no point of K lies in more than two of 
the 7, . But then 


< 2naf U 1, «2nA(G) &2ne| 
Since e was arbitrary, À( K ) = 0, as required. a 


Example 31.4. Restrict the F of Examples 31.1 and 31.3 to (0, 1), and let g be the 
inverse. Thus F and g are continuous, strictly increasing mappings of (0, 1) onto itself. 
If A =[x € (0,1): F(x) = 0], then ACA) = 1, as shown in the examples, while u( 4) = 
0. Let H be a set in (0, 1) that is not a Lebesgue set. Since H — A is contained in a set 
of Lebesgue measure 0, it is a Lebesgue set; hence H; = H (1 is not a Lebesgue set, 
since otherwise H =H,U(H-—A) would be a Lebesgue set. If B —(0, x], then 
Ag (B) =A(0, F(x)] = F(x) = (B), and it follows that Ag ^! (B) = u(B) for all Borel 
sets B. Since g !H, is a subset of g !4 and A(g !I4)- (A) —- 0, g^!H, is a 
Lebesgue set. On the other hand, if g !H, were a Borel set, Hy = F^ (g^! H5) would 
also be a Borel set. Thus (pee 6 s provides an example of a Lebesgue set that is not a 
Borel set.* a 


Integrals of Derivatives 


Return now to the problem of extending part (ii) of Theorem 31.1, to the 
problem of characterizing those distribution functions F for which F' inte- 
grates back to F: 


(31.25) F(x) -f F'(t) dt. 


tSee Problem 31.8. 
tFor a different argument, see Problem 3.14. 


SECTION 31. DERIVATIVES ON THE LINE 413 


The first step is easy: If (31.25) holds, then F has the form 


(31.26) F(x) =f f(t) dr 


for a nonnegative, integrable f (a density), namely f = F’. On the other hand, 
(31.26) implies by Theorem 31.3 that F'=f outside a set of Lebesgue 
measure 0, whence (31.25) follows. Thus (31.25) holds if and only if F has the 
form (31.26) for some f, and the problem is to characterize functions of this 
form. The function of Example 31.1 is not among them. 

As observed earlier (see (31.3)), an F of the form (31.26) with f integrable 
is continuous. It has a still stronger property: For each e there exists a ó such 
that 


(31.27) J flx)de<e if AA) «8. 
A 


Indeed, if A, =[x: f(x)» n], then A, 1 @, and since f is integrable, the 
dominated convergence theorem implies that f, f(x) dx < e /2 for large n. 
Fix such an n and take 8 = e/2?2n. If A(A)«6, then f,f(x)dx « 
laa FOES], FC) dx SNA e/a 

If F is given by (31.26), then F(b) — F(a) = f?f(x) dx, and (31.27) has this 
consequence: For every e there exists a ó such that for each finite collection 
[a;, b;], i= 1,..., k, of nonoverlapping’ intervals, 


k k 
(31.28) Y|F(b)-F(a)|«e if E Oa) 2c 


i=1 i=1 


A function F with this property is said to be absolutely continuous.* A 
function of the form (31.26) (f integrable) is thus absolutely continuous. 

A continuous distribution function is uniformly continuous, and so for 
every e there is a ó such that the implication in (31.28) holds provided that 
k =1. The definition of absolute continuity requires this to hold whatever k 
may be, which puts severe restrictions on F. Absolute continuity of F can be 
characterized in terms of the measure 4: 


Theorem 31.7. Suppose that F and p are related by (31.18). Then F is 
absolutely continuous in the sense of (31.28) if and only if u( A) = 0 for every A 
for which ACA) = 0. 


‘Intervals are nonoverlapping if their interiors are disjoint. In this definition it is immaterial 
whether the intervals are regarded as closed or open or half-open, since this has no effect on 
(31.28). 

+The definition applies to all functions, not just to distribution functions. If F is a distribution 
function as in the present discussion, the absolute-value bars in (31.28) are unnecessary. 


414 DERIVATIVES AND CONDITIONAL PROBABILITY 


PRoor. Suppose that F is absolutely continuous and that A(A)=0. 
Given e, choose 5 so that (31.28) holds. There exists a countable disjoint 
union B= U,/, of intervals such that ACB and A(B) <6. By (31.28) it 
follows that u(U 7 ,/,) xe for each n and hence that u(A) < p(B) < e. 
Since e was arbitrary, u(A) = 0. 

If F is not absolutely continuous, then there exists an e such that for every 
8 some finite disjoint union A of intervals satisfies A(4) < ó and 1L CA) > e. 
Choose A, so that A(A,) « n^? and 4(A,) > e. Then A(limsup, A,,) = 0 by 
the first Borel-Cantelli lemma (Theorem 4.3, the proof of which does not 
require P to be a probability measure or even finite). On the other hand, 
u (lim sup, A,) > € > 0 by Theorem 4.1 (the proof of which applies because u 
is assumed finite). m 


This result leads to a characterization of indefinite integrals. 


Theorem 31.8. A distribution function F(x) has the form f* ¿f(t) dt for 
an integrable f if and only if it is absolutely continuous in the sense of (31.28). 


Proor. That an F of the form (31.26) is absolutely continuous was 
proved in the argument leading to the definition (31.28). For another proof, 
apply Theorem 31.7: if F has this form, then ACA) = 0 implies that wCA) = 
f fG) dt = 0. 


To go the other way, define for any distribution function F 


(31.29) F) je F'(t) dt 
and 
(31.30) F(x) = F(x) — F(x). 


Then F, is right-continuous, and by (31.9) it is both nonnegative and 
nondecreasing. Since F,. comes form a density, it is absolutely continuous. By 
Theorem 31.3, F;,— F' and hence F/-—0 except on a set of Lebesgue 
measure 0. Thus F has a decomposition 


(31.31) F(x) —» F (x) + F(x), 


where F... has a density and hence is absolutely continuous and F is singular 
This is called the Lebesgue decomposition. ) i 

Suppose that F is absolutely continuous. Then F, of (31.30) must, as the 
difference of absolutely continuous functions, be absolutely cotib itself 
If it can be shown that F, is identically 0, it will follow that F = F... has the 
required form. It thus suffices to show that a distribution function that is both 
absolutely continuous and singular must vanish. 


SECTION 31. DERIVATIVES ON THE LINE 415 


If a distribution function F is singular, then by Theorem 31.5 there are 
disjoint supports S, and S,. But if F is also absolutely continuous, then from 
A(S,,) = 0 it follows by Theorem 31.7 that uS, ) = 0. But then u(R^) = 0, and 
so F(x) = 0. " 


This theorem identifies the distribution functions that are integrals of their 
derivatives as the absolutely continuous functions. Theorem 31.7, on the 
other hand, characterizes absolute continuity in a way that extends to spaces 
Q without the geometric structure of the line necessary to a treatment 
involving distribution functions and ordinary derivatives.! The extension is 
studied in Section 32. 


Functions of Bounded Variation 


The remainder of this section briefly sketches the extension of the preceding theory to 
functions that are not monotone. The results are for simplicity given only for a finite 
interval [a,b] and for functions F on [a, b] satisfying F(a) = 0. 

If F(x) = f? f(1) dt is an indefinite integral, where f is integrable but not necessar- 
ily nonnegative, then F(x) = [7f*(t) dt — [xf (t) dt exhibits F as the difference of 
two nondecreasing functions. The problem of characterizing indefinite integrals thus 
leads to the preliminary problem of characterizing functions representable as a 
difference of nondecreasing functions. 

Now F is said to be of bounded variation over [a,b] if sup, ||F lla is finite, where 
\|F ||, is defined by (31.5) and A ranges over all partitions (31.4) of [a, b]. Clearly, a 
difference of nondecreasing functions is of bounded variation. But the converse holds 
as well: For every finite collection I of nonoverlapping intervals [x;, y;] in [a, b], put 


P =} (F(y)-F(x) Nr= L (FO) — F(x)) . 


Now define 
P(x) = sup Pr, N(x) = sup Nr, 
r r 
where the suprema extend over partitions I of [a, x]. If F is of bounded variation, 


then P(x) and N(x) are finite. For each such T, Pr = Nr + F(x). This gives the 
inequalities 


Pr € N(x) * F(x), P(x)2zNpr-F(x), 
which in turn lead to the inequalities 
P(x) <N(x)+F(x), P(x) 2 N(x) * F(x). 
Thus 
(31.32) F(x) = P(x) - N(x) 


gives the required representation: A function is the difference of two nondecreasing 
functions if and only if it is of bounded variation. 


‘Theorems 31.3 and 31.8 do have geometric analogues in R^; see RupiN;, Chapter 8. 


Tm DERIVATIVES AND CONDITIONAL PROBABILITY 


_ If Tr PL N,, then Tp = LIF(y,) — F(x;)l. According to the definition (31.28), F 
is absolutely continuous if for every e there exists a à such that Tr < € whenever the 
intervals in the collection T have total length less than ó. If F is absolutely 
continuous, take the ó corresponding to an e of 1 and decompose [a, b] into a finite 
number, say n, of subintervals [u; uj] of lengths less than ó. Any partition A of 
[a, b] can by the insertion of the u, be split into n sets of intervals each of total length 
less than à, and it follows! that ||FllA € n. Therefore, an absolutely continuous 
function is necessarily of bounded variation. 

An absolutely continuous F thus has a representation (31.32). It follows by the 
definitions that P( y) — P(x) is at most supp Tr, where I’ ranges over the partitions of 
[x,y]. If [x, y,] are nonoverlapping intervals, then X(PCy)) — P(x;)) is at most 
supr Tp, where now T ranges over the collections of intervals that partition each of 
the [x,, y,]. Therefore, if F is absolutely continuous, there exists for each e a ó such 
that X(y; —x,) < ó implies that X(P(y;) — P(x;)) < e. In other words, P is absolutely 
continuous. Similarly, N is absolutely continuous. 

Thus an absolutely continuous F is the difference of two nondecreasing absolutely 
continuous functions. By Theorem 31.8, each of these is an indefinite integral, which 
implies that F is an indefinite integral as well: For an F on [a, b] satisfying F(a) = 0, 
absolute continuity is a necessary and sufficient condition for F to be an indefinite 
integral —to have the form F(x) = f? f(t) dt for an integrable f. 


PROBLEMS 


31.1. Extend Examples 31.1 and 31.3: Let p,,...,p, , be nonnegative numbers 
adding to 1, where r > 2; suppose there is no i such that p;— 1. Let Xj, 
X,,... be independent, identically distributed random variables such that 
P[X, =i] =p;, 0x i €r, and put X = 37 .,X,r ". Let F be the distribution 
function of X. Show that F is continuous. Show that F is strictly increasing 
over the unit interval if and only if all the p, are strictly positive. Show that 
F(x) =x for 0xx <1 if p;2r' ! and that otherwise F is singular; prove 
singularity by extending the arguments both of Example 31.1 and of Example 
31.3. What is the analogue of (31.17)? 


31.2. t In Problem 31.1 take r = 3 and py = p; = 5, p, = 0. The corresponding F 
is called the Cantor function. The complement in [0, 1] of the Cantor set (see 
Problems 1.5 and 3.16) consists of the middle third (4,2), the middle thirds 
(3,5) and (3,9), and so on. Show that F is 5 on the first of these intervals, + 
on the second, 7 on the third, and so on. Show by direct argument that F' = 0 
except on a set of Lebesgue measure 0. 


31.3. A real function f of a real variable is a Lebesgue function if [x: f(x) <a] is a 
Lebesgue set for each a. 


(a) Show that, if f, is a Borel function and f, is a Lebesgue function, then 
the composition f, f, is a Lebesgue function. 

(b) Show that there exists a Lebesgue function f, and a Lebesgue (even Borel, 
even continuous) function f, such that f, f, is not a Lebesgue function. Hint: 
Use Example 31.4. 


*This uses the fact that ||Flla cannot decrease under passage to a finer partition. 


SECTION 31. DERIVATIVES ON THE LINE 417 


31.4. 


31.5. 


t An arbitrary function f on (0,1] can be represented as a composition of a 
Lebesgue function f, and a Borel function f,. For x in (0, 1], let d, (x) be the 
nth digit in its nonterminating dyadic expansion, and define fx) = 

”_12d,(x)/3". Show that f, is increasing and that f,(0, 1] is contained in the 
Cantor set. Take f(x) to be f(f; '(x)) if x e f,(0, 1] and 0 if x € (0, 1] — f2(0, 1]. 
Now show that f f; f;. 


Let r,,r5,... be an enumeration of the rationals in (0,1) and put F(x)= 
Lier, < 2 *. Define e by (14.5) and prove that it is continuous and singular. 


. Suppose that u and F are related by (31.18). If F is not absolutely continuous, 


then u(A)> 0 for some set A of Lebesgue measure 0. It is an interesting fact, 
however, that almost all translates of A must have u-measure 0. From Fubini’s 
theorem and the fact that A is invariant under translation and reflection 
through 0, show that, if A(A)=0 and u is o-finite, then uA + x) = 0 for x 
outside a set of Lebesgue measure 0. 


31.7. 17.4. 31.67 Show that F is absolutely continuous if and only if for each 


31.9. 


31.10. 


31.11. 


31.12. 


31.13. 


Borel set A, u(.A + x) is continuous in x. 


. Let F(x) = lim; „o inf(F(v) — F(u))/(v — u), where the infimum extends 


over u and v such that u <x <v and v — u < ô. Define F*(x) as this limit with 
the infimum replaced by a supremum. Show that in Theorem 31.4, F' can be 
replaced by F* in part (i) and by F, in part (ii). Show that in Theorem 31.6, 
FD can be replaced by F, (note that F, (x) < -D(x)). 


Lebesgue's density theorem. A point x is a density point of a Borel set A if 
A(u,v]NA)/(v —u) > 1 as ut x and v | x. From Theorems 31.2 and 31.4 
deduce that almost all points of A are density points. Similarly, A((u, v] ^ 
A)/(v — u) > 0 almost everywhere on AS. 


Let f: [a,b] 9 R* be an arc; f(t) = (f (t),..., f(t). Show that the arc is 
rectifiable if and only if each f; is of bounded variation over [a, b]. 


I 


T Suppose that F is continuous and nondecreasing and that F(0)=0, 
F(1) = 1. Then f(x) = (x, F(x)) defines an arc f: [0,1] > R?. It is easy to see 
by monotonicity that the arc is rectifiable and that, in fact, its length satisfies 
L(f ) < 2. It is also easy, given e, to produce functions F for which L(f)»2-e. 


Show by the arguments in the proof of Theorem 31.4 that L( f)=2 if F is 
singular. 


Suppose that the characteristic function of F satisfies lim sup, >œ l(t) = 1. 
Show that F is singular. Compare the lattice case (Problem 26.1). Hint: Use 
the Lebesgue decomposition and the Riemann-Lebesgue theorem. 


Suppose that X,, X,,... are independent and assume the values +1 with 
probability ? each, and let X = X? ,X,/2". Show that X is uniformly dis- 
tributed over [— 1, + 1]. Calculate the characteristic functions of X and X 
and deduce (1.40). Conversely, establish (1.40) by trigonometry and conclude 
that X is uniformly distributed over [— 1, +1]. 


418 DERIVATIVES AND CONDITIONAL PROBABILITY 


31.14. (a) Suppose that X,, X,,... are independent and assume the values 0 and 1 
with probability ; each. Let F and G be the distribution functions of 
= XS, 4/2? -1 and EZ X,, 722". Show that F and G are singular but 
that F * G is absolutely continuous. 
(b) Show that the convolution of an absolutely continuous distribution func- 


tion with an arbitrary distribution function is absolutely continuous. 


31.15. 31.27 Show that the Cantor function is the distribution function of 
> X,/3", where the X, are independent and assume the values 0 and 2 


with probability 4 each. Express its characteristic function as an infinite 
product. 


31.16. Show for the F of Example 31.1 that £D(1)- » and D/'(0)—0 if 
Do € 3. From (31.17) deduce that ,D(x) = » and D^ (x) — 0 for all dyadic 
rationals x. Analyze the case po > 5 and sketch the graph. 


31.17. 6.147 Let F be as in Example 31.1, and let pw be the corresponding 
probability measure on the unit interval. Let d,(x) be the nth digit in the 
nonterminating binary expansion of x, and let s,(x) = X7 -,d4 (x). If L (x) is 
the dyadic interval of order n containing x, then 


5,( x) 


(91.33)... = 2 logu(1,(x)) = -(1 = h] log po — — log Pı- 


(a) Show that (31.33) converges on a set of u-measure 1 to the entropy 
h= Po log po — Py log p,. From the fact that this entropy is less than log 2 if 
Po * 5, deduce in this case that on a set of u-measure 1, F does not have a 
finite derivative. 


(b) Show that (31.33) converges to — 5 log po — š log p, on a set of Lebesgue 
measure 1. If po; this limit exceeds log2 (arithmetic versus geometric 
means), and so 4(1,(x))/2 " 0 except on a set of Lebesgue measure 0. This 
does not prove that F'(x) exists almost everywhere, but it does show that 
except for x in a set of Lebesgue measure 0, if F'(x) does exist, then it is 0. 


(c) Show that, if (31.33) converges to /, then 


(31.34) im (G) _ (9. ifa 7 1/10g2, 
n (278) "| Ow Mifraket/log 2 


If (31.34) holds, then (roughly) F satisfies a Lipschitz condition’ of (exact) 
Pei 1/log2. ia F ae a Lipschitz condition of order h/log2 on a set 
of -measure 1 and a Lipschitz condition of order ( — 1 m 

on a set of Lebesgue measure 1. r (— 3 log po — š log p,)/log2 


31.18. van der Waerden’s continuous, nowhere differentiable function is f(x) = 
Sly XQ, where a(x) is the distance from x to the nearest integer and 
ax) 2” a (2 x), Show by the Weierstrass M-test that f is continuous. Use 
(31.8) and the ideas in Example 31.1 to show that f is nowhere differentiable 


t : : ^r | 
A Lipschitz condition of order a holds at x if F(x +h) — F(x) = O(|A|*) 

i impli ' Sad 3 as h 4 
this implies F'(x) = 0, and for 0 «a <1 it is a smoothness condition stronger Pede a? 1 
and weaker than differentiability. ntinuity 


Em 


SECTION 32. THE RADON-NIKODYM THEOREM 419 


31.19. Show (see (31.31)) that (apart from addition of constants) a function can have 
only one representation F, + F, with F, absolutely continuous and F; singu- 
lar. 


31.20. Show that the F. in the Lebesgue decomposition can be further split into 
F; + F4, where F. is continuous and singular and F} increases only in jumps 
in the sense that the corresponding measure is discrete. The complete decom- 
position is then F = Fie + F... + Fy. 


31.21. (a) Suppose that x, <x, < ::: and X,|F(x,)| =œ. Show that, if F assumes 
the value 0 in each interval (x,, x,,,,), then it is of unbounded variation. 
(b) Define F over [0,1] by F(0)=0 and F(x)=x® sinx! for x> 0. For 
which values of a is F of bounded variation? 


31.22. 14.47 If f is nonnegative and Lebesgue integrable, then by Theorem 31.3 
and (31.8), except for x in a set of Lebesgue measure 0, 


1 


(31.35) saa J FO) ae fQ) 


if u<x<v, u <u, and u, v — x. There is an analogue in which Lebesgue 
measure is replaced by a general probability measure pw: If f is nonnegative 
and integrable with respect to u, then as h [ 0, 


NR EEUU rr 


on a set of -measure 1. Let F be the distribution function corresponding to 
uw, and put g(u) = inf[x: u < F(x)] for 0 <u « 1 (see (14.5)). Deduce (31.36) 
from (31.35) by change of variable and Problem 14.4. 


SECTION 32. THE RADON-NIKODYM THEOREM 


If f is a nonnegative function on a measure space (Q, Z; u), then v( A4) = 
[fdp defines another measure on F. In the terminology of Section 16, v 
has density f with respect to yw; see (16.11). For each A in F, u(A4)-0 
implies that v(A) = 0. The purpose of this section is to show conversely that 
if this last condition holds and v and wp are o-finite on F, then v has a 
density with respect to u. This was proved for the case (R!, 2', A) in 
Theorems 31.7 and 31.8. The theory of the preceding section, although 
illuminating, is not required here. 


Additive Set Functions 


Throughout this section, (Q, F ) is a measurable space. All sets involved are 
assumed as usual to lie in F. 


a DERIVATIVES AND CONDITIONAL PROBABILITY 


An additive set function is a function ọ from ¥ to the reals for which 
(32.1) e| U A,] = >, e(A,) 


if A,, A,,... is a finite or infinite sequence of disjoint sets. A set function 
differs from a measure in that the values e(.A) may be negative but must be 
finite—the special values +œ and —e are prohibited. It will turn out that 
the series on the right in (32.1) must in fact converge absolutely, but this need 
not be assumed. Note that o(@) = 0. 


Example 32.1. If u, and u, are finite measures, then (A) = (A) — 
1. (A) is an additive set function. It will turn out that the general additive set 
function has this form. A special case of this if g(A) = [,fdu, where f is 
integrable (not necessarily nonnegative). a 


The proof of the main theorem of this section (Theorem 32.2) requires 
certain facts about additive set functions, even though the statement of the 
theorem involves only measures. 


Lemma 1. /f Et E or E, E, then g(E,) > (E). 


Proor. If E,1E, then e(E)= gE, U UX. (E, — ED) COREE 
2 ap UE, s ig E ) ES lim ,[9CE,) ar XE 1 $5 EDI = lim, pl E..) by 
(32.1). If E, | E, then E; 1 ES, and hence #(E,) = (Q) — e( ES) > e(Q) — 
e(E°) = (E). a 


Although this result is essentially the same as the corresponding ones for 
measures, it does require separate proof. Note that the limits need not be 
monotone unless o happens to be a measure. 


The Hahn Decomposition 


Theorem 32.1. For any additive set function e, there exist disjoint sets A* 


and A^ such that A* UA" — Q, eC(E) 2 0 for all E in A*, and (E) <0 for 
all E in A`. 


A set A is positive if (E) 2 0 for ECA and negative if e( E) < 0 for 
E CA. The A* and A^ in the theorem decompose Q into a positive and a 
negative set. This is the Hahn decomposition. 

If 9CA) = f ,fdu, (see Example 32.1), the result is easy: take 4* — [ f > 0] 
and A `= [f < 0]. 


SECTION 32. THE RADON-NIKODYM THEOREM 421 


PRoor. Let a=sup[g(A): A € F]. Suppose that there exists a set A+ 
satisfying g@( A+) = a (which implies that a is finite). Let 4 — 0—4*. If 
ACA* and o(A) < 0, then @( A t — A) > a, an impossibility; hence A* is a 
positive set. If A CÀ ` and 9C A) > 0, then 9(A* U A) > a, an impossibility; 
hence Á is a negative set. 

It is therefore only necessary to construct a set At for which g(A*) = a. 
Choose sets A, such that @(A,) — o, and let A=U,A,. For each n 
consider the 2" sets B,, (some perhaps empty) that are intersections of the 
form (172. A, where each A’, is either A, or else A —A,. The collection 
4, =(B,;: 1 «i x 2"] of these sets partitions A. Clearly, Z, refines Z, : 
each B,; is contained in exactly one of the B,  , ;. 

Let C, be the union of those B,, in @, for which 9(B,,,) > 0. Since A, is 
the union of certain of the B,;, it follows that 9CA,) < &(C,). Since the 
partitions @,, @,,... are successively finer, m < n implies that (C,, U -: * U 
C, (UC) (C, U +++ UC, )) is the union (perhaps empty) of certain of 
the sets B,,; the B,, in this union must satisfy e( B,,) > 0 because they are 
contained in C,. Therefore, o(C,, U == UC, ) € @(G;, Ü «+: UC,), so that 
by induction (Am) < 9(C,,) < (Cm U ++: UC,). If D,, = U5-,4,C,; then by 
Lemma 1 (take E, 2 C,U -:: UC,,,,) e( At) < eC(D,). Let A* — (15 4D, 
(note that .4 * = lim sup, C,), so that D,, | At. By Lemma 1, a = lim,, P(A m) 
< lim „ 9(D,,) = (A+). Thus A* does have maximal ¢g-value. m 


If et(A)= (AQ A*) and ç (A)= —@(A nA), then et and g are 
finite measures. Thus 


(32.2) p(A) =~" (A) -¢ (A) 


represents the set function q as the difference of two finite measures having 
disjoint supports. If E CA, then e(E) x o" (E) x e * CA), and there is equal- 
ity if E—-An4A*. Therefore, ¢ ™(A)= supp- 4 (E). Similarly, e^ (A) = 
—infg. ,q(E). The measures e* and 9^ are called the upper and lower 
variations of e, and the measure |g| with value g*(A)+@ (A) at A is 
called the total variation. The representation (32.2) is the Jordan decomposi- 
tion. 


Absolute Continuity and Singularity 


Measures |. and v on (Q, F ) are by definition mutually singular if they have 
disjoint supports—that is, if there exist sets S, and S, such that 


u(0—5,)-0, »(0-—S,) -0, 


32.3 
(323) $,n$,- @. 


422 DERIVATIVES AND CONDITIONAL PROBABILITY 


In this case uw is also said to be singular with respect to v and v singular with 
respect to u. Note that measures are automatically singular if one of them is 
identically 0. 

According to Theorem 31.5 a finite measure on R' with distribution 
function F is singular with respect to Lebesgue measure in the sense of 
(32.3) if and only if F'(x) — 0 except on a set of Lebesgue measure 0. In 
Section 31 the latter condition was taken as the definition of singularity, but 
of course it is the requirement of disjoint supports that can be generalized 
from R! to an arbitrary Q. 

The measure v is absolutely continuous with respect to m if for each A in 
F, uCA) = 0 implies v(A) = 0. In this case v is also said to be dominated by 
u, and the relation is indicated by v < u. If v <p and u < v, the measures 
are equivalent, indicated by v = u. 

A finite measure on the line is by Theorem 31.7 absolutely continuous in 
this sense with respect to Lebesgue measure if and only if the corresponding 
distribution function F satisfies the condition (31.28). The latter condition, 
taken in Section 31 as the definition of absolute continuity, is again not the 
one that generalizes from R! to Q. 

There is an e—ó idea related to the definition of absolute continuity given 
above. Suppose that for every e there exists a ó such that 


(32.4) v(.A) «e if u( A) <ô. 


If this condition holds, (A) = 0 implies that v(A)<e for all e, and so 
v <p. Suppose, on the other hand, that this condition fails and that v is 
finite. Then for some e there exist sets A, such that &(4,) « n^? and 
v(A,)ze. If A=limsup, A,, then u(A4) - 0 by the first Borel—Cantelli 
lemma (which applies to arbitrary measures), but v( A) > e > 0 by the right- 
hand inequality in (4.9) (which applies because v is finite). Hence v < u fails, 
and so (32.4) follows if v is finite and v < u. If v is finite, in order that 
v « y, it Is therefore necessary and sufficient that for every e there exist a à 
satisfying (32.4). This condition is not suitable as a definition, because it need 
not follow from v < y if v is infinite.' 


The Main Theorem 


If (A) = f,fdp, then certainly v < u. The Radon -Nikodym theorem goes 
in the opposite direction: 


Theorem 32.2. Jf H and v are a-finite measures such that v < y, then 
there exists a nonnegative f, a density, such that v(A) = [,fdu forallA € F. 
For two such densities f and g, ul f # g] = 0. 


*See Problem 32.3. 


SECTION 32. THE RADON-NIKODYM THEOREM 423 


The uniqueness of the density up to sets of -measure 0 is settled by 
Theorem 16.10. It is only the existence that must be proved. 

The density f is integrable u if and only if v is finite. But since f is 
integrable u over A if v(A) < o, and since v is assumed o-finite, f < © 
except on a set of u-measure 0; and f can be taken finite everywhere. By 
Theorem 16.11, integrals with respect to v can be calculated by the formula 


(32.5) J hav = | hfdu. 


The density whose existence is to be proved is called the Radon-Nikodym 
derivative of v with respect to u and is often denoted dv/dp. The term 
derivative is appropriate because of Theorems 31.3 and 31.8: For an abso- 
lutely continuous distribution function F on the line, the corresponding 
measure u has with respect to Lebesgue measure the Radon-Nikodym 
derivative F'. Note that (32.5) can be written 


(32.6) J hdv - | hp du. 


Suppose that Theorem 32.2 holds for finite u and v (which is in fact 
enough for the probabilistic applications in the sections that follow). In the 
c-finite case there is a countable decomposition of Q into -sets A, for 
which uCA,) and vCA,) are both finite. If 


(32.7) uh, A)=B(ANA,), | (4) -v(AnA,), 


then v < y. implies v, < u,, and so v,(A) ^ Í, f, du, for some density f,. 
Since z, has density J, with respect to u (Example 16.9), 


(A) 7 EnA) 7 Y. | f, du, 7 Y, f fala, du 
= | Ef, La, du. 


Thus Y, f, 1,. is the density sought. 


It is therefore enough to treat finite u and v. This requires a preliminary 
result. 


/ Lemma 2. If and v are finite measures and are not mutually singular 
then there exists a set A and a positive e such that u( A) > 0 and e E 
for all E c A. ad 


424 DERIVATIVES AND CONDITIONAL PROBABILITY 


PRoor. Let AUA; be a Hahn decomposition for the set function 
v —n !u; put M= U,A;, so that M^ = (1,A,. Since M“ is in the negative 
set A> forv—n yu, it follows that v(M*) < n `!u( M°); since this holds for 
all n, v( M^) = 0. Thus M supports v, and from the fact that u and v are not 
mutually singular it follows that M° cannot support ~—that is, that #( M) 
must be positive. Therefore, u(/A7; ) » 0 for some n. Take A=A; and 
EM. = 


Example 32.2. Suppose that (Q, Z) = (R!,.Z!), u is Lebesgue measure 
À, and v(a, b] = F(b) — F(a). If v and À do not have disjoint supports, then 
by Theorem 31.5, A[x: F'(x) > 0] > 0 and hence for some e, A =[x: F(x)> 
e] satisfies ACA) > 0. If E = (a, b] is a sufficiently small interval about an x in 
A, then v(E)/ACE) = (F(b) — F(a))/(b — a) > e, which is the same thing as 
eÀ( E) x v(E). L| 


Thus Lemma 2 ties in with derivatives and quotients v(E)/u(E) for 
"small" sets E. Martingale theory links Radon-Nikodym derivatives with 
such quotients; see Theorem 35.7 and Example 35.10. 


PROOF OF THEOREM 322. Suppose that u and v are finite measures 
satisfying v < u. Let £ be the class of nonnegative functions g such that 
Jegdp € v (E) for all E. If g and g' lie in Y, then max(g, g’) also lies in 2 
because 


J max. £) d = gdu + ji g'du 


En[gz&! N[g'>g] 


«v(En[egzsg']) *v(En[g'»g]) -»(E). 


Thus .£ is closed under the formation of finite maxima. Suppose that 
functions g, lie in ¥ and g,1 g. Then f; gdu = lim, /e8, du < v(E) by the 
monotone convergence theorem, so that g lies in 2. Thus .£ is closed under 
nondecreasing passages to the limit. 

Let e = sup /gdy for g ranging over ¥ (a < v(Q)). Choose g, in . so 
that (e, du. >a —n !, If f, "Tax 2. uon g,) and f = lim fa» then f lies in 
F and ffdy = lim, ff, du > lim, fg, du =a. Thus f is an element of . for 
which {fd is maximal. 


Define v, by v,(E) = fr fdy and v, by > (E) 2 (E) — vA CE). Thus 
(32.8) (E) = v. (E) + v,( E) = f fdu + v,( E). 
E 


Since f is in Z, v, as well as v,, is a finite measure— that is, nonnegative. Of 
course, v. is absolutely continuous with respect to y. 


SECTION 31. DERIVATIVES ON THE LINE 425 


Suppose that v, fails to be singular with respect to u. It then follows from 
Lemma 2 that there are a set A and a positive e such that uA) > 0 and 
eu(E) < v CE) for all E C A. Then for every E 


[ (f+ els) du = f fdu * e&(En A) < f fdu + (ENA) 
E E E 


=f fan +v(ENA) +f fdu 


v(E NA) + J fan <v(ENA)+v(E-A) 


WE). 


In other words, f + eL, lies in Y; since [(f + eL,) du =a + eu(A) >a, this 
contradicts the maximality of f. 

Therefore, u and v, are mutually singular, and there exists an S such that 
v (S) = u(CS*) = 0. But since v < u, v($*) < v($*) = 0, and so v (Q) = 0. The 
rightmost term in (32.8) thus drops out. m 


Absolute continuity was not used until the last step of the proof, and what 
the argument shows is that v always has a decomposition (32.8) into an 
absolutely continuous part and a singular part with respect to u. This is the 
Lebesgue decomposition, and it generalizes the one in the preceding section 
(see (31.31)). 


PROBLEMS 


32.1. There are two ways to show that the convergence in (32.1) must be absolute: 
Use the Jordan decomposition. Use the fact that a series converges absolutely 
if it has the same sum no matter what order the terms are taken in. 


322. If A*U A^ is a Hahn decomposition of o, there may be other ones Aj U Aj. 
Construct an example of this. Show that there is uniqueness to the extent that 
e(A* ^A!) - (A^ ^4,) 7 0. 


32.3. Show that absolute continuity does not imply the e—àë condition (32.4) if v is 
infinite. Hint: Let F consist of all subsets of the space of integers, let v be 
counting measure, and let u have mass n ^? at n. Note that g is finite and v is 
o-finite. 


32.4. Show that the Radon-Nikodym theorem fails if u is not o-finite, even if v is 
finite. Hint: Let F consist of the countable and the cocountable sets in an 
uncountable Q, let u be counting measure, and let v(A) be 0 or 1 as A is 
countable or cocountable. 


426 


32.5. 


32.6. 


32.7. 


32.8. 


32.9. 


32.10. 


DERIVATIVES AND CONDITIONAL PROBABILITY 


Let u be the restriction of planar Lebesgue measure A, to the c-field 
= (A x R!: A € 4) of vertical strips. Define v on F by v(A X R!)= A;(A 
x (0, )). Show that v is absolutely continuous with respect to u but has no 
density. Why does this not contradict the Radon-Nikodym theorem? 


Let u, v, and p be o-finite measures on (Q, Z ). Assume the Radon-Nikodym 
derivatives here are everywhere nonnegative and finite. 


(a) Show that v <p and u <p imply that v <p and 


dv | dv dp 


dp dy dp: 
(b) Show that v = u implies 


oo ay E 
du ^ "[du/dv»0] dp $ 


(c) Suppose that u <p and v <p, and let A be the set where dv/dp > 0 = 
du /dp. Show that v < u if and only if p(A) = 0, in which case 


dv ` I dv/dp 
du d: [du /dp» 0 qu /dp F 

Show that there is a Lebesgue decomposition (32.8) in the o-finite as well as 
the finite case. Prove that it is unique. 


The Radon-Nikodym theorem holds if u is c-finite, even if v is not. Assume 
at first that u is finite (and v < u). 

(a) Let Z be the class of (Asets) B such that (E) = 0 or v(E) = o for each 
E c B. Show that Z contains a set By of maximal u-measure. 

(b) Let 4 be the class of sets in Qg = Bú that are countable unions of sets of 
finite v-measure. Show that 4 contains a set Cy of maximal u-measure. Let 
(c) Deduce from the maximality of By and C, that u(D.) = (Do) = 0. 


(d) Let v9(A) = vCA n Qo). Using the Radon-Nikodym theorem for the pair 
H, Vo, prove it for u, v. 


(e) Now show that the theorem holds if u is merely o-finite. 
(f) Show that if the density can be taken everywhere finite, then v is o-finite. 


Let u and v be finite measures on (Q, F), and suppose that F° is a o-field 
contained in 7. Then the restrictions u° and v? of u and v to F° are 
measures on (Q, Z °). Let Vie v,, Vic» Vs be, respectively, the absolutely contin- 
uous and singular parts of v and v° with respect to u and u°. Show that 
vA (E) z v, CE) and v? CE) < (E) for E € F°, 


Suppose that 4, v,v, are finite measures on (Q, 5) and that »(.4) = Y v CA) 
for all A. Let v,(A)= f,f, du * v,CA) and v( A) = f fd, + v'CA) be the 
decompositions (32.8); here v' and v; are singular with respect to u. Show that 
f= X, f, except on a set of u-measure 0 and that v'( A) = Y, v. CA) for all A. 
Show that v <u if and only if v, < y. for all n. 7 


SECTION 33. CONDITIONAL PROBABILITY 427 


32.11. 32.27 Absolute continuity of a set function @ with respect to a measure p is 
defined just as if ọ were itself a measure: (A) = 0 must imply that #( A) = 0. 
Show that, if this holds and u is o-finite, then g(A)=f{,fdu for some 
integrable f. Show that A*=[@: f(w)>0] and A-=[w: f(w) <0] give a 
Hahn decomposition for e. Show that the three variations satisfy o * CA) = 
Lf? du, e (A)= [Af dp, and [gl A) = fAlf| du. Hint: To construct f, start 
with (32.2). 


32.12. T A signed measure q is a set function that satisfies (32.1) if A,, A>,... are 
disjoint and may assume one of the values +œ and — but not both. Extend 
the Hahn and Jordan decompositions to signed measures. 


32.13. 31.227 Suppose that u and v are a probability measure and a o-finite 
measure on the line and that v < w. Show that the Radon-Nikodym derivative 
f satisfies 
uA vy(x—-h,x *h] 
ho (x —h,x +h] 


-f(x) 
on a set of u.-measure 1. 


32.14. Find on the unit interval uncountably many probability measures p ,, 0 < p < 1, 
with supports S, such that {x} = 0 for each x and p and the S, are disjoint 
in pairs. 


32.15. Let .7, be the field consisting of the finite and the cofinite sets in an 
uncountable ©. Define e on Z by taking e( A) to be the number of points in 
A if A is finite, and the negative of the number of points in A‘ if A is cofinite. 
Show that (32.1) holds (this is not true if Q is countable). Show that there are 
no negative sets for e (except the empty set), that there is no Hahn decomposi- 
tion, and that e does not have bounded range. 


SECTION 33. CONDITIONAL PROBABILITY 


The concepts of conditional probability and expected value with respect to a 
o-field underlie much of modern probability theory. The difficulty in under- 
standing these ideas has to do not with mathematical detail so much as with 
probabilistic meaning, and the way to get at this meaning is through calcula- 
tions and examples, of which there are many in this section and the next. 


The Discrete Case 


Consider first the conditional probability of a set A with respect to another 
set B. It is defined of course by P(A|B)= PCA ^ B)/P(B), unless P(B) 
vanishes, in which case it is not defined at all. 

It is helpful to consider conditional probability in terms of an observer in 
possession of partial information.! A probability space (Q, F, P) describes 


t ; i A " 
As always, observer, information, know, and so on are informal, nonmathematical terms; see the 
related discussion in Section 4 (p. 57). 


428 DERIVATIVES AND CONDITIONAL PROBABILITY 


the working of a mechanism, governed by chance, which produces a result o 
distributed according to P; P(A) is for the observer the probability that the 
point w produced lies in A. Suppose now that w lies in B and that the 
observer learns this fact and no more. From the point of view of the observer, 
now in possession of this partial information about w, the probability that w 
also lies in A is P(A|B) rather than PCA). This is the idea lying back of the 
definition. 

If, on the other hand, « happens to lie in B° and the observer learns of 
this, his probability instead becomes P(A|B‘). These two conditional proba- 
bilities can be linked together by the simple function 


P(AIB) if o€B, 


s 700) Bragg t e 


The observer learns whether o lies in B or in B°; his new probability for the 
event w € Á is then just f(w). Although the observer does not in general 
know the argument w of f, he can calculate the value f(w) because he knows 
which of B and B< contains w. (Note conversely that from the value f(w) it 
is possible to determine whether w lies in B or in B^, unless P( A|B) = 
P(A|B*)—that is, unless A and B are independent, in which case the 
conditional probability coincides with the unconditional one anyway.) 

The sets B and B° partition Q, and these ideas carry over to the general 
partition. Let B,, B,,... be a finite or countable partition of Q into Fsets, 
and let ¥ consist of all the unions of the B;. Then # is the o-field generated 
by the B;. For A in Z, consider the function with values 


P(A n B,) 


(332) f(w) = P(AIB,) = — rB) 


Ie B-. kh? 


If the observer learns which element B, of the partition it is that contains Q, 
then his new probability for the event w € A is f(w). The partition (B,), or 
equivalently the o-field Y, can be regarded as an experiment, and to learn 
which B, it is that contains w is to learn the outcome of the experiment. For 
this reason the function or random variable f defined by (33.2) is called the 
conditional probability of A given .4 and is denoted P[ All.Z]. This is written 
P[ All-4],, whenever the argument w needs to be explicitly shown. 

Thus P[A||4] is the function whose value on B, is the ordinary condi- 
tional probability P(A|B,). This definition needs to be completed, because 
P(A|B;) is not defined if P(B,) = 0. In this case P[.All.4] will be taken to 
have any constant value on B,; the value is arbitrary but must be the same 
over all of the set B;. If iiie are nonempty sets B; for which P(B,)- 0, 

P[All-7] therefore stands for any one of a family of functions on Q. A 
specific such function is for emphasis often called a version of the conditional 


SECTION 33. CONDITIONAL PROBABILITY 429 


probability. Note that any two versions are equal except on a set of probabil- 
ity 0. 


Example 33.1. Consider the Poisson process. Suppose that 0 < s < t, and 
let A=[N,=0] and B,=[N,=i], i=0,1,.... Since the increments are 
independent (Section 23), P( A|B;) = PIN, = 0]P[ N, — N, = i]/P[ N, = i], and 
since they have Poisson distributions (see (23.9)), a simple calculation reduces 
this to 


(333) PINSO¥),=(1-2) iteen, 2010308. 


Since i = N,(@) on B,, this can be written 


N,Co) 
(33.4) P[N, = o4], = (1- 7) 


Here the experiment or observation corresponding to (Bj) or FY deter- 
mines the number of events—telephone calls, say—occurring in the time 
interval [0, z]. For an observer who knows this number but not the locations 
of the calls within [0, z], (33.4) gives his probability for the event that none of 
them occurred before time s. Although this observer does not known o, he 
knows N,(w), which is all he needs to calculate the right side of (33.4). a 


Example 33.2. Suppose that Xə, X,,... is a Markov chain with state 
space Š as in Section 8. The events 


(335) [ Xo is,..., X, =i, | 


form a finite or countable partition of Q as iy,...,/,, range over S. If 2, is 
the o-field generated by this partition, then by the defining condition (8.2) for 
Markov chains, P[ X, , , = jll-4,],, = p; ; holds for w in (33.5). The sets 


(33.6) [X, =i] 


for i € S also partition Q, and they generate a o-field Z° smaller than J. 
Now (8.2) also stipulates PLX,,, —jl-?], —p;; for w in (33.6), and the 
essence of the Markov property is that 


(33.7) P[x,,, =F] = P[X,., = HF. a 


The General Case 


If Y is the o-field generated by a partition B, B5,..., then the general 
element of # is a disjoint union B; U B; U ---, finite or countable, of 
certain of the B;. To know which set B, it is that contains w is the same thing 


430 DERIVATIVES AND CONDITIONAL PROBABILITY 


as to know which sets in ¥ contain œ and which do not. This second way of 
looking at the matter carries over to the general ø-field .4 contained in F. 
(As always, the probability space is (Q, F, P).) The o-field .4 will not in 
general come from a partition as above. 

One can imagine an observer who knows for each G in 4 whether w € G 
or e € G°. Thus the o-field Z can in principle be identified with an 
experiment or observation. This is the point of view adopted in Section 4; see 
p. 57. It is natural to try and define conditional probabilities P[ A||] with 
respect to the experiment .Z. To do this, fix an A in Z and define a finite 
measure v on .€ by 


«(G)-P(AnG), Gez. 


Then P(G) = 0 implies that v(G) = 0. The Radon-Nikodym theorem can be 
applied to the measures v and P on the measurable space (Q, Z) because 
the first one is absolutely continuous with respect to the second.’ It follows 
that there exists a function or random variable f, measurable . and 
integrable with respect to P, such that! P(A N G) = x(G) = fa fdP for all G 
in Y. 

Denote this function f by P[Al|-2]. It is a random variable with two 
properties: 


(i) P[A||4] is measurable -€ and integrable. 
Gi) P[A||4Y] satisfies the functional equation 


(33.8) J PLAIS] dP = P(A (GQ) Ge 


There will in general be many such random variables P[A]|.2], but any two 
of them are equal with probability 1. A specific such random variable is 
called a version of the conditional probability. 

If Y is generated by a partition B,,B,,... the function f defined by 
(33.2) is measurable Z because [w: f(w) € H] is the union of those B, over 
which the constant value of f lies in H. Any G in . is a disjoint union 
G= U,B,, and P(A n G) = Y, PCA|B, )P(B, ), so that (33.2) satisfies (33.8) 
as well. Thus the general definition is an extension of the one for the discrete 
case. 

Condition (i) in the definition above in effect requires that the values of 
P[ A|l¥] depend only on the sets in ¥. An observer who knows the outcome 
of Y viewed as an experiment knows for each G in ¥ whether it contains w 
or not; for each x he knows this in particular for the set [w’: P[ A||Y ], =x], 


"Let Po be the restriction of P to Y (Example 10.4), and find on (Q, 2) a density f for v with 
respect to Po. Then, for G € £, v(G) = fgfdPo = fc; fdP (Example 16.4). If g is another such 
density, then P[f = g] = P,[ f * g] = 0. 


x SECTION 33. CONDITIONAL PROBABILITY 431 


and hence he knows in principle the functional value P[A||7],, even if he 
does not know w itself. In Example 33.1 a knowledge of N,(w) suffices to 
determine the value of (33.4)—w itself is not needed. 

Condition (ii) in the definition has a gambling interpretation. Suppose that 
the observer, after he has learned the outcome of .Z, is offered the opportu- 
nity to bet on the event A (unless A lies in .Z, he does not yet know whether 
or not it occurred). He is required to pay an entry fee of P[ A||] units and 

= will win 1 unit if A occurs and nothing otherwise. If the observer decides to 
T bet and pays his fee, he gains 1—P[A||¥] if A occurs and —P[AII-£] 
| otherwise, so that his gain is 


(1 -PLAIS ]) L, ut 4E EA PP s - P[Al| 4]. 


If he declines to bet, his gain is of course 0. Suppose that he adopts the 
strategy of betting if G occurs but not otherwise, where G is some set in Z. 
He can actually carry out this strategy, since after learning the outcome of 
the experiment ¥ he knows whether or not G occurred. His expected gain 
with this strategy is his gain integrated over G: 


J Gu, - P[ All¥]) dP. 


But (33.8) is exactly the requirement that this vanish for each G in Z. 
Condition (ii) requires then that each strategy be fair in the sense that the 
observer stands neither to win nor to lose on the average. Thus P[ Al|-2] is 
the just entry fee, as intuition requires. 


Example 33.3. Suppose that A € Y, which will always hold if .4 coin- 
cides with the whole o-field Y. Then J, satisfies conditions (i) and (ii), so 
that P[ A||Y] = L, with probability 1. If A € Y, then to know the outcome of 
Y viewed as an experiment is in particular to know whether or not A has 
Occurred. E 


Example 33.4. If .4 is {0,0}, the smallest possible o-field, every function 
measurable Y must be constant. Therefore, P[.All-4],, = P( A) for all o in 
this case. The observer learns nothing from the experiment .£. a 


According to these two examples, P[ All{0, Q} is identically P(A), whereas 

I, is a version of P[Al|.Z ]. For any Y, the function identically equal to 
P(A) satisfies condition (i) in the definition of conditional probability, whereas 

I, satisfies condition (ii). Condition (i) becomes more stringent as 4 de- 

. creases, and condition (ii) becomes more stringent as 2 increases. The two 
_ conditions work in opposite directions and between them delimit the class of 
| versions of P| All]. usd 
x 


432 DERIVATIVES AND CONDITIONAL PROBABILITY 


Example 33.5. Let Q be the plane R? and let Z be the class 2? of 
planar Borel sets. A point of Q is a pair (x, y) of reals. Let be the o-field 
consisting of the vertical strips, the product sets FE x R'=[(x, y: x €E], 
where E is a linear Borel set. If the observer knows for each strip E x R! 
whether or not it contains (x, y), then, as he knows this for each one-point 
set E, he knows the value of x. Thus the experiment Y consists in the 
determination of the first coordinate of the sample point. Suppose now that 
P is a probability measure on 2? having a density f(x, y) with respect to 
planar Lebesgue measure: P(A)= ff,f(x, y) dxdy. Let A be a horizontal 
strip R'xXF-[(x, y: y€ F], F being a linear Borel set. The conditional 
probability P[ All-Z] can be calculated explicitly. 

Put 


[f(a ty at 


(33.9) e( x, y) = Ee 
J fo 0) at 
R! 


Set g(x, y) — 0, say, at points where the denominator here vanishes; these 
points form a set of P-measure 0. Since g(x, y) is a function of x alone, it is 
measurable .Z. The general element of ¥ being E x R!, it will follow that p 
is a version of P[ All-Z] if it is shown that 


(33.10) JO) dP(x,y) = P(A (E xR')). 


Since A = R' X F, the right side here is P(E x F). Since P has density f, 
Theorem 16.11 and Fubini’s theorem reduce the left side to 


ATL dv) dx= f | rO) 2 dx 


= f], f(y) dedy = P(E x F). 
ExF 
Thus (33.9) does give a version of P[ R! x F||.2]. E 


The right side of (33.9) is the classical formula for the conditional 
probability of the event R'X F (the event that y €F) given the event 
(x) x R' (given the value of x). Since the event (x) x R! has probability 0, 
the formula P(A|B) = P(A N B)/P(B) does not work here. The whole point 
of this section is the systematic development of a notion of conditional 
probability that covers conditioning with respect to events of probability 0. 
This is accomplished by conditioning with respect to collections of 
events—that is, with respect to ø-fields Y. 


SECTION 33. CONDITIONAL PROBABILITY 433 


Example 33.6. The set A is by definition independent of the o-field . if 
it is independent of each G in .Z: P(A n G) = P( A)P(G). This being the 
same thing as P(A N G) = f; P(A) dP, A is independent of . if and only if 
P[ A||¥] = P(A) with probability 1. " 


The o-field o ( X) generated by a random variable X consists of the sets 
[w: X(w) € H] for H € .2!; see Theorem 20.1. The conditional probability 
of A given X is defined as P[Allo(X)] and is denoted P[A||X]. Thus 
P[ Al X] = P[ Allo C X)] by definition. From the experiment corresponding to 
the o-field e ( X), one learns which of the sets [w': X(w’) = x] contains o and 
hence learns the value X(w). Example 33.5 is a case of this: take X(x, y) = x 
for (x, y) in the sample space Q = R? there. 

This definition applies without change to random vector, or, equivalently, 
to a finite set of random variables. It can be adapted to arbitrary sets of 
random variables as well. For any such set [X,, te T], the o-field al X,, 
t € T] it generates is the smallest o-field with respect to which each X, is 
measurable. It is generated by the collection of sets of the form [w: X,(w) € 
H]for t in T and H in .Z!. The conditional probability PLA||X,, t € T] of A 
with respect to this set of random variables is by definition the conditional 
probability P[ Allo[ X,, t € T]] of A with respect to the o-field o[ X,, t € T]. 

In this notation the property (33.7) of Markov chains becomes 


(33.11) PIX ec Moor Xn = eei RENE 


The conditional probability of [ X, , , =J] is the same for someone who knows 
the present state X, as for someone who knows the present state X, and the 
past states X,,..., X, , as well. 


n 


Example 33.7. Let X and Y be random vectors of dimensions j and k, 
let u be the distribution of X over R’, and suppose that X and Y are 
independent. According to (20.30), 


P[XeH,(X,Y)«J] = J PUY) eJ]u(dx) 


for H € & and J € 4/**. This is a consequence of Fubini's theorem; it has 
a conditional-probability interpretation. For each x in R’ put 


(33.12) f(x) =P[(x,Y) €J] 2 P[o': (x, Y(w)) eJ]. 


By Theorem 20.1(ii), f(X(w)) is measurable: c(X), and since u is the 
distribution of X, a change of variable gives 


J OQ) P(49) = | Gu) = P(X, Y) eJ] n[xe n]. 


434 DERIVATIVES AND CONDITIONAL PROBABILITY 

Since [X € H] is the general element of o(X), this proves that 

(33.13) f(X(@)) =P[(X,Y) eJIX],, 

with probability 1. a 
The fact just proved can be written 


P[(X,Y)eJ|| x], = P[(X(@), Y) €/] 
= P[w': (X(w), ¥(o’)) €J]. 


Replacing «' by w on the right here causes a notational collision like the one 
replacing y by x causes in f/Pf(x, y) dy. 


Suppose that X and Y are independent random variables and that Y has 
distribution function F. For J = [(u, v): max(u,v) < m], (33.12) is 0 for m <x 
and F(m) for m > x; if M = max( X, Y), then (33.13) gives 


(33.14) P[M x ml X], =I x<, (e) F(m) 


with probability 1. All equations involving conditional probabilities must be 
qualified in this way by the phrase with probability 1, because the conditional 
probability is unique only to within a set of probability 0. 

The following theorem is useful for checking conditional probabilities. 


Theorem 33.1. Let P be a t-system generating the o-field -Z, and suppose 
that X) is a finite or countable union of sets in P. An integrable function f is a 
version of P| A\|4] if it is measurable -Z and if 


(33.15) f faP - P(AnG) 
G 


holds for all G in P. 
Proor. Apply Theorem 10.4. = 


The condition that Q is a finite or countable union of sets cannot be 
suppressed; see Example 10.5. 


Example 33.8. Suppose that X and Y are independent random variables with a 
common distribution function F that is positive and continuous. What is the condi- 
tional probability of [ X < x] given the random variable M = max( X, Y)? As it should 
clearly be 1 if M < x, suppose that M > x. Since X < x requires M = Y, the chance of 
which is 5 by symmetry, the conditional probability of [ X < x] should by indepen- 
dence be 5F(x)/F(m) = 3P[X < x| X < m] with the random variable M substituted 


pe 


SECTION 33. CONDITIONAL PROBABILITY 435 


for m. Intuition thus gives 
l Td 
(33.16) P[X x xI|M], = Ii «xy (0) 2c 5l» ©) BG) F(M(o))- 


It suffices to check (33.15) for sets G = [M < m], because these form a 7r-system 
generating &( M). The functional equation reduces to 


(3317) P[M € min(x,m)] + sf PX) dP — P[M <m, X xx]. 


eM ml QM) 


Since the other case is easy, suppose that x < m. Since the distribution of (X,Y) is 
product measure, it follows by Fubini's theorem and the assumed continuity of F that 


mew DU rm T: 
ff guy EO dF) =2(F(m) = F(z)), 


u<u 
x<usm 


which gives (33.17). m 


Example 33.9. A collection [X,: t > 0] of random variables is a Markov 
process in continuous time if for k 2 1, 0 <t; < --- <t, <u, and H € !, 


(33.18) P| XH Xo: Xa) =P | X e BIS 


holds with probability 1. The analogue for discrete time is (33.11). (The X, 
there have countable range as well, and the transition probabilities are 
constant in time, conditions that are not imposed here.) 

Suppose that t < u. Looking on the right side of (33.18) as a version of the 
conditional probability on the left shows that 


(33.19) J PIX, €HIIX,] dP =P([X, e H] n G) 

G 
ifO0<t,<--- <t,=t<u and Geo(X,,..,,X,). Fix u, and H, and let 
k and t,...,t, vary. Consider the class P= UolX,, ..,X,,), the union 
extending over all k > 1 and all k-tuples satisfying ies Boo < age 


Acum x ; K) and B ec(X,, js ate); then AnBeo(x,, P o X, ), 
where us r, are ‘the sg and the t, T C together. Thus # is a ema 
Since # generates o[X;: MET and P[X,€ Hl||X,] is measurable with 
respect to this ø-field, it follows by (33.19) and Theorem 33.1 that P[ X, € 
H || X,] is a version of P[ X, € H|| X,, s < t]: 


(33.20) P(X € HIX,, s <t) dead t<u 


? 


with probability 1. 


436 DERIVATIVES AND CONDITIONAL PROBABILITY 


This says that for calculating conditional probabilities about the future, 
the present o(X,) is equivalent to the present and the entire past ol X;: 
s < t]. This follows from the apparently weaker condition (33.18). a 


Example 33.10. The Poisson process [N,: t > 0] has independent incre- 
ments (Section 23). Suppose that 0 € /, € ++: «f, <u. The random vector 
UN, y N, = Ny oo INS NA ) is independent of N, — N,,, and so (Theorem 
20.2) (N, , N,» JN. PEN independent of N, — N,.. If J is the set of points 
COE. Xs yi in T such that x, +y € H, where H € 4, and if v is the 
distribution of N, — N, , then (33.12) 18 PCE oor bee = N JEJ Ie Pix, 
+N, m EHI PM x,). Therefore, (83. 13) gives P[N, € 
HIN, ... N,] ^ v(H — N, ). This holds also if k = 1, and hence P[N, € 
BIN S. N ]- PIN, € HIN, ol The Poisson process thus has the Markov 
property (33. 18); this is a consequence solely of the independence of the 
increments. The extended Markov property (33.20) follows. w 


Properties of Conditional Probability 


Theorem 33.2. With probability 1, P[@|| Z] = 0, PLAI] = 1; and 


(33.21) 0<P[All¥] <1 
for each A. If A,, A>,... is a finite or countable sequence of disjoint sets, then 
(33.22) P| U d = 3P[A,l-*] 


with probability 1. 


Proor. For each version of the conditional probability, [g P[ All¥]dP = 
P(A  G) 2 0 for each G in Y; since P[A||-7] is measurable .Z, it must be 
nonnegative except on a set of P-measure 0. The other inequality in (33.21) is 


proved the same way. 
If the A, are disjoint and if G lies in .£, it follows (Theorem 16.6) that 


[(DPl Aa} dP- > f PLANIS] dP = X P(A, NG) 
-»| U4,) 0G}. 2 


Thus >, P[ A,||Z], which is certainly measurable ., satisfies the functional — 
equation for P[U , A,.llZ], and so must coincide with it except perhaps on. k: 
set of P-measure 0. Hence (33.22). ar? E. š 


SECTION 33. CONDITIONAL PROBABILITY 437 


Additional useful facts can be established by similar arguments. If A C B, then 
(3323) P[B -Al.Z] = PLBI.£] — P[ AllZ],  P[AI.Z] < P[ BIA]. 


The inclusion—exclusion formula 


(33.24) P 


Ú Aine] = DP[All¥]-— LPL A nA] + --- 


i-1 i<j 

holds. If A, 1 A, then 

(33.25) P[ A, ||4]? PUIA], 

and if A, | A, then 

(33.26) P| A,Il-4 ] 4 PLAN]. 
Further, P( A) = 1 implies that 

(33.27) PIA 

and P(A)= 0 implies that 

(33.28) P| A||] = 0. 

Of course (33.23) through (33.28) hold with probability 1 only. 


Difficulties and Curiosities 


This section has been devoted almost entirely to examples connecting the 
abstract definition (33.8) with the probabilistic idea lying back of it. There are 
pathological examples showing that the interpretation of conditional proba- 


bility in terms of an observer with partial information breaks down in certain 
cases. 


Example 33.11. Let (O,.7, P) be the unit interval Q with Lebesgue 
measure P on the o-field F of Borel subsets of OQ. Take .4 to be the o-field 
of sets that are either countable or cocountable. Then the function identically 
equal to P(A) is a version of P[ All-Z]: (33.8) holds because P(G) is either 0 
or 1 for every G in &Y. Therefore, 


(33.29) P[ Al], = P( A) 


with probability 1. But since . contains all one-point sets, to know which 


438 DERIVATIVES AND CONDITIONAL PROBABILITY 


elements of .Z contain w is to know «o itself. Thus .Z viewed as an 
experiment should be completely informative—the observer given the infor- 
mation in .£ should know w exactly—and so one might expect that 


4]1 TOGA; 
(33.30) PLA] = [0 if w «4. 


This is Example 4.10 in a new form. m 


The mathematical definition gives (33.29); the heuristic considerations 
lead to (33.30). Of course, (33.29) is right and (33.30) is wrong. The heuristic 
view breaks down in certain cases but is nonetheless illuminating and cannot, 
since it does not intervene in proofs, lead to any difficulties. 

The point of view in this section has been “global.” To each fixed A in F 
has been attached a function (usually a family of functions) P[Al|¥Y],, 
defined over all of Q. What happens if the point of view is reversed—if w is 
fixed and A varies over F ? Will this result in a probability measure on F ? 
Intuition says it should, and if it does, then (33.21) through (33.28) all reduce 
to standard facts about measures. 

Suppose that B,,...,B, is a partition of Q into sets, and let = 
o(B,,..., B,). If P(B,) =0 and P(B;) > 0 for the other i, then one version of 


P[ All] is 


17 if o € B,, 


AAE y PCIE) 


P(B) fo ER, i-o E 


With this choice of version for each A, P[Al|.27],, is, as a function of 4, a 
probability measure on Z if w € B,U ::: UB,, but not if œw € B,. The 
“wrong” versions have been chosen. If, for example, 


P( A) if w € B}, 
PLANA] =) P(A NHA | 
—RB) if o€B;,i-2,...,r, 


then P[Al|l-7], is a probability measure in A for each w. Clearly, versions 
such as this one exist if # is finite. 

It might be thought that for an arbitrary o-field Y in . versions of the 
various P[All|Z] can be so chosen that P[A|-], is for each fixed w a 
probability measure as A varies over F. It is possible to construct a 


SECTION 33. CONDITIONAL PROBABILITY 439 


counterexample showing that this is not so.’ The example is possible because 
the exceptional w-set of probability 0 where (33.22) fails depends on the 
sequence A,, Aj,...; if there are uncountably many such sequences, it can 
happen that the union of these exceptional sets has positive probability 
whatever versions P| All-Z] are chosen. 

The existence of such pathological examples turns out not to matter. 
Example 33.9 illustrates the reason why. From the assumption (33.18) the 
notably stronger conclusion (33.20) was reached. Since the set [X, € H] is 
fixed throughout the argument, it does not matter that conditional probabili- 
ties may not, in fact, be measures. What does matter for the theory is 
Theorem 33.2 and its extensions. 

Consider a point oo with the property that P(G) 0 for every G in .Z 
that contains oo. This will be true if the one-point set {wo} lies in Z and has 
positive probability. Fix any versions of the P[ AllZ]. For each A the set [w: 

PAI], < 0] lies in £ and has probability 0; it therefore cannot contain 
wg. Thus P PAIS ],,. > 0. Similarly, P[OQ||Y],, — 1, and, if the A, are dis- 
joint, P[U , A lll, ee AME Nc Therefore, PLAS], NE roO iba 
measure as A ranges over F. 

Thus conditional probabilities behave like probabilities at points of posi- 
tive probability. That they may not do so at points of probability 0 causes no 
problem because individual such points have no effect on the probabilities of 
sets. Of course, sets of points individually having probability 0 do have an 
effect, but here the global point of view reenters. 


Conditional Probability Distributions 
Let X be a random variable on (Q, F, P), and let Y be a o-field in F. 


Theorem 33.3. There exists a function u(H, o), defined for H in @! and 
w in Q, with these two properties: 


(i) For each w in Q, uC , o) is a probability measure on R. 
(i) For each H in 4, w(H, ) is a version of P[ X € H|\|F). 


The probability measure u(:, w) is a conditional distribution of X given 2. 
If Y= o (Z), it is a conditional distribution of X given Z. 


Proor. For each rational r, let F(r, w) be a version of P[ X < rl ],. 
r < s, then by (33.23), 


" 


af 
Uwe n 
a 


"The argument is outlined in Problem 33.11. It depends on the construction of "m 
nonmeasurable sets. 


(33.31) F(r,w) x F(s,o) 


440 DERIVATIVES AND CONDITIONAL PROBABILITY 


for w outside a 4set A,, of probability 0. By (33.26), 


(33.32) F(r,o) = lim F(r +n ',v) 


for w outside a ¥set B, of probability 0. Finally, by (33.25) and (33.26), 


(33.33) lim F(r,w) ^ 0, lim F(r,w) =1 


outside a #set C of probability 0. As there are only countably many of these 
exceptional sets, their union E lies in ¥ and has probability 0. 

For w € E extend F(-,@) to all of R! by setting F(x, w) = infl F(, o): 
x <r]. For o € E take F(x, w)= F(x), where F is some arbitrary but fixed 
distribution function. Suppose that w € E. By (33.31) and (33.32), F(x, w) 
agrees with the first definition on the rationals and is nondecreasing; it is 
right-continuous; and by (33.33) it is a probability distribution function. 
Therefore, there exists a probability measure u(-,«) on (R!, 2!) with 
distribution function F(-,w). For o € E, let u(-,w) be the probability 
measure corresponding to F(x). Then condition (i) is satisfied. 

The class of H for which &4(H,:) is measurable ¥ is a A-system 
containing the sets H = ( —=, r] for rational r; therefore u(H, - ) is measur- 
able .Z for H in 4. 

By construction, u((— oo, r |, w) = PLX <r||4],, with probability 1 for ratio- 
nal r; that is, for H = ( —=, r] as well as for H = R! 


J HUH, @) P(do) -P([XeH]nG) 


for all G in Y. Fix G. Each side of this equation is a measure as a function 
of H, and so the equation must hold for all H in .Z!. D 


Example 33.12. Let X and Y be random variables whose joint distribu- 
tion v in R? has density f(x, y) with respect to Lebesgue measure: P[( X, Y) 
€ A] — vCA) — [f f(x, y) dxdy. Let g(x, y)=f(x, y)/ fg f(x, t) dt, and let 
ACH, x) = f gGx, y) dy have probability density g(x, : ); if in f(x; t) dt = 0, 
let w(-, x) be an arbitrary probability measure on the line. Then ACH, X(w)) 
will serve as the conditional distribution of Y given X. Indeed, (33.10) is the 
same thing as fe, ju F, x) dv(x, y) ? v(E X F), and a change of variable 
gives fix e gu (F, X(w))P(dw) = P[X € E, Y € F]. Thus u (F, X(w)) is a ver- 
sion of P[Y € F|| X ],. This is a new version of Example 33.5. a 


SECTION 33. CONDITIONAL PROBABILITY 441 


PROBLEMS 


33.1. 20.271  Borel's paradox. Suppose that a random point on the sphere is 


33.2. 


33.3. 


specified by longitude © and latitude ®, but restrict 9 by 0 < 0 < 7, so that © 
specifies the complete meridian circle (not semicircle) containing the point, 
and compensate by letting range over (—7, 7]. 


(a) Show that for given © the conditional distribution of ® has density 
‘cos ó| over (— mr, +=]. If the point lies on, say, the meridian circle through 
Greenwich, it is therefore not uniformly distributed over that great circle. 


(b) Show that for given ® the conditional distribution of @ is uniform over 
(0, 7). If the point lies on the equator (6 is 0 or 7), it is therefore uniformly 
distributed over that great circle. 

Since the point is uniformly distributed over the spherical surface and 
great circles are indistinguishable, (a) and (b) stand in apparent contradiction. 
This shows again the inadmissibility of conditioning with respect to an isolated 
event of probability 0. The relevant o-field must not be lost sight of. 


20.167 Let X and Y be independent, each having the standard normal 
distribution, and let ( R,@) be the polar coordinates for (X,Y). 


(a) Show that X + Y and X — Y are independent and that R? — [CX + Y)? + 
CX — Y )?]/2, and conclude that the conditional distribution of R? given X — Y 
is the chi-squared distribution with one degree of freedom translated by 
(X — Y)2/2. 

(b) Show that the conditional distribution of R? given © is chi-squared with 
two degrees of freedom. 


(c) If X — Y — 0, the conditional distribution of R? is chi-squared with one 
degree of freedom. If @ — 7/4 or @ = 5T/4, the conditional distribution of 
R? is chi-squared with two degrees of freedom. But the events [ X — Y = 0] and 
[0 27/4] U [0 = 57/4] are the same. Resolve the apparent contradiction. 


^ Paradoxes of a somewhat similar kind arise in very simple cases. 


(a) Of three prisoners, call them 1, 2, and 3, two have been chosen by lot for 
execution. Prisoner 3 says to the guard, “Which of 1 and 2 is to be executed? 
One of them will be, and you give me no information about myself in telling 
me which it is." The guard finds this reasonable and says, “Prisoner 1 is to be 
executed." And now 3 reasons, “I know that 1 is to be executed; the other will 
be either 2 or me, and so my chance of being executed is now only A instead of 
the š it was before," Apparently, the guard has given him information. 

If one looks for a a-field, it must be the one describing the guard's answer, 
and it then becomes clear that the sample space is incompletely specified. 
Suppose that, if 1 and 2 are to be executed, the guard's response is “1” with 
probability p and “2” with probability 1 — p; and, of course, suppose that, if 3 
is to be executed, the guard names the other victim. Calculate the conditional 
probabilities. 


(b) Assume that among families with two children the four sex distributions 
are equally likely. You have been introduced to one of the two children in such 
a family, and he is a boy. What is the conditional probability that the other is a 
boy as well? 


442 


33.4. 


33.5. 


33.6. 


337. 


33.8. 


DERIVATIVES AND CONDITIONAL PROBABILITY 


(a) Consider probability spaces (Q, Z, P) and (O', F', P"); suppose that T: 
Q 5 Q' is measurable ¥/ F’ and P' = PT `!. Let #' be a o-field in F’, and 
take £ to be the o-field [T^ ' G': G' €.4']. For A’ € F', show by (16.18) that 
PIT A'Z], = P'LA'I-Z"],,, with P-probability 1. 

(b) Now take (Q',.2", P) = (R?, A’, u), where pw is the distribution of a 
random vector (X,Y) on (Q, F, P). Suppose that (X,Y) has density f, and 
show by (33.9) that 


[IX Co), t) at 
jn 


P[Y e F|| x], = -— —— — 
J fX), 1) dt 


with probability 1. 


^ (a) There is a slightly different approach to conditional probability. Let 
(Q, Z, P) be a probability space, (', 7") a measurable space, and T: €) > (' 
a mapping measurable 7/.'". Define a measure v on ¥' by v(A') = P(A ñ 
T i4") for A' € F'. Prove that there exists a function p( Alo") on Q', measur- 
able .Z” and integrable PT `!, such that [4 p( Alo) PT (do) = P(A N T'A’) 
for all A’ in Z'. Intuitively, pCAlo') is the conditional probability that w € A 
for someone who knows that Tw = «'. Let =[T!4’: A' € .%”]; show that Z 
is a o-field and that p(A|Tw) is a version of P[ All-Z],,. 


(b) Connect this with part (a) of the preceding problem. 


T Suppose that T = X is a random variable, (Q', ¥')=(R!, 2!), and x is 
the general point of R!. In this case p( A|x) is sometimes written P[.A IX =x]. 
What is the problem with this notation? 


For the Poisson process (see Example 33.1) show that for 0 « s « t, 


P[N, = kIIN] = MICE =F) eats 


0, k> N. 


Thus the conditional distribution (in the sense of Theorem 33.3) of N, given N, 
is binomial with parameters N, and s/t. 


29.121 Suppose that (X,, X,) has the centered normal distribution—has in 
the plane the distribution with density (29.10). Express the quadratic form in 
the exponential as 


SECTION 33. CONDITIONAL PROBABILITY 443 


to 
to 
| 


33.11. 


33.12. 


integrate out the x, and show that 


2 
MEET oo|- (s 224] 
J f(xy, tthe MUT 


where 7 = o> — ohoj. Describe the conditional distribution of X; given X;. 


. (a) Suppose that uCH, w) has property (i) in Theorem 33.3, and suppose that 


u(H,-)is a version of P[ X € H||4] for H in a 7-system generating A'. Show 
that u(-,@) is a conditional distribution of X given 4%. 

(b) Use Theorem 12.5 to extend Theorem 33.3 from R! to R*. 

(c) Show that conditional probabilities can be defined as genuine probabilities 
on spaces of the special form (Q, 0(X,,..., X,), P). 


. 1 Deduce from (33.16) that the conditional distribution of X given M is 


l lii », M(w)]) 
5i e u() ar p(- o, M(w)]}) ? 


where uw is the distribution corresponding to F (positive and continuous). Hint: 
First check H = (— e, x]. 


410 12.4t The following construction shows that conditional probabilities 
may not give measures. Complete the details. 

In Problem 4.10 it is shown that there exist a probability space (Q, F, P), a 
c-field Z in F, and a set H in ¥ such that P(H)— 5$, H and A are 
independent, Y contains all the singletons, and # is generated by a countable 
subclass. The countable subclass generating Y can be taken to be a 7-system 
9 — (B,, B;,...) (pass to the finite intersections of the sets in the original 
class). 

Assume that it is possible to choose versions P[ All-Z] so that P[.Al|-4],, is 
for each o a probability measure as A varies over .Z. Let C, be the w-set 
where P[B,l|-7],, = Ig (0); show (Example 33.3) that C — f, ë has probabil- 
ity 1. Show that = G implies that P[G||¥],, = Ie(@) for “all G in # and 
hence that P[(o)|. Z], = 1. 

Now o € H n C implies that P[H||4],, > P[Gv)l-4], =1 and w € H° n C 
implies that P[H||¥],, < P[Q —(w)l-], = 0. Thus we € C implies that 
PLAS), =1,(w). But since H and < are independent, P[H||7], = 
P(H)= ; with probability 1, a contradiction. 

This ails is related to Example 4.10 but concerns mathematical fact 
instead of heuristic interpretation. 


Let a and f be o-finite measures on the line, and let f(x, y) be a probability 
density with respect to a X B. Define 


f(x. y) i 


(33.34) g.(y) = 
J £C Bat) 


33.13. 


33.15. 


33.16. 


DERIVATIVES AND CONDITIONAL PROBABILITY 


unless the denominator vanishes, in which case take g,(y) = 0, say. Show that, 
if (X,Y) has density f with respect to a x fj, then the conditional distribution 
of Y given X has density gy y) with respect to B. This generalizes Examples 
33.5 and 33.12, where a and B are Lebesgue measure. 


18207? Suppose that u and v, (one for each real x) are probability measures 
on the line, and suppose that v,( B) is a Borel function in x for each B € .@!. 
Then (see Problem 18.20) 


(33.35) T( E) = J rely: (x, y) e E]u(dx) 


defines a probability measure on ( R?, A’). 
Suppose that (X,Y) has distribution 77, and show that vy is a version of the 
conditional distribution of Y given X. 


. 1 Let a and f be o-finite measures on the line. Specialize the setup of 


Problem 33.13 by supposing that u has density f(x) with respect to = and Ys 
has density g,(y) with respect to B. Assume that g,(y) is measurable 2 
in the pair (x, y), so that v,(B) is automatically measurable in x. Show 
that (33.35) has density f(x)g,(y) with respect to a X B: v(E) — 
f fgfG9g,Cy)a(dx)B(dy). Show that (33.34) is consistent with f(x, y)= 
f(x)g, Cy). Put 


J Ee )alds) 


p,(x) — 


Suppose that (X,Y) has density f(x)g,(y) with respect to a X B, and show 
that py(x) is a density with respect to a for the conditional distribution of X 
given Y. 

In the language of Bayes, f(x) is the prior density of a parameter x, g (y) 
is the conditional density of the observation y given the parameter, and p(x) 
is the posterior density of the parameter given the observation. á 


T Now suppose that a and f are Lebesgue measure, that f(x) is positive, 
continuous, and bounded, and that g,(y) =e 0-**^/2/ 27.75. Thus the 


observation is distributed as the average of n independent normal variables 
with mean x and variance 1. Show that 


e / 


qr tap) uta 


for fixed x and y. Thus the posterior density is approximately that of a normal 
distribution with mean y and variance 1/n. 


32.131 Suppose that X has distribution u. Now P[A|| X] = f( X f 
some Borel function f. Show that lim, „o PLAlx — h < X <x l. zi e e 
in a set of u-measure 1. Roughly speaking, PLA|x -h « X « x +h] 5 PLA |X 
= x]. Hint: Take v(B) = P(A A[X € B) in Problem 32.13. 


SECTION 34. CONDITIONAL EXPECTATION 445 
SECTION 34. CONDITIONAL EXPECTATION 


In this section the theory of conditional expectation is developed from first 
principles. The properties of conditional probabilities will then follow as 
special cases. The preceding section was long only because of the examples in 
it; the theory itself is quite compact. 


Definition 


Suppose that X is an integrable random variable on (Q, F, P) and that F is 
a o-field in Z. There exists a random variable E[X||4], called the condi- 
tional expected value of X given Y, having these two properties: 


(i) E[X||4Y] is measurable Y and integrable. 
Gi) E[XI|] Z] satisfies the functional equation 


(34.1) [FLX ]aP = | xap, GEZ. 


To prove the existence of such a random variable, consider first the case of 
nonnegative X. Define a measure v on Y by x(G) = fg XdP. This measure 
is finite because X is integrable, and it is absolutely continuous with respect 
to P. By the Radon-Nikodym theorem there is a function f, measurable 2, 
such that v(G) — f; f dP.! This f has properties (i) and (ii). If X is not 
necessarily nonnegative, E[X"|-7] — ELX ||-Z] clearly has the required 
properties. 

There will in general be many such random variables E[ X||#]; any one of 
them is called a version of the conditional expected value. Any two versions 
are equal with probability 1 (by Theorem 16.10 applied to P restricted to A). 

Arguments like those in Examples 33.3 and 33.4 show that E[ X|K0, Q} = 
E[ X] and that E[X||.7 ] = X with probability 1. As -£ increases, condition 
(i) becomes weaker and condition (ii) becomes stronger. 

The value E[ X||4], at w is to be interpreted as the expected value of X 
for someone who knows for each G in ¥ whether or not it contains the point 
w, which itself in general remains unknown. Condition (i) ensures that 
E[ X||-4] can in principle be calculated from this partial information alone. 
Condition (ii) can be restated as f; (X — E[ X||4]) dP = 0; if the observer, in 
possession of the partial information contained in .£, is offered the opportu- 
nity to bet, paying an entry fee of E[X||#] and being returned the amount ` 
X, and if he adopts the strategy of betting if G occurs, this equation says that 
the game is fair. 


TAs in the case of conditional probabilities, the integral is the same on (Q, F, P) as on (Q, 4) 
with P restricted to Y (Example 16.4). 


446 DERIVATIVES AND CONDITIONAL PROBABILITY 


Example 34.1. Suppose that B,, B,,... is a finite or countable partition of 
Q generating the o-field Z. Then E[X||4] must, since it is measurable .Z, 
have some constant value over B,, say a; Then (34.1) for G = B, gives 
a; P(B,) = Ip, X dP. Thus 


l 
= LL | BY) > 0. 
(34.2) EL XIIS]. PCB J za, wEB,  P(B)»0 


If P(B,) = 0, the value of E[ X||4] over B; is constant but arbitrary. D 


Example 34.2. For an indicator L, the defining properties of E[Z,ll-] 
and P[ Al||¥] coincide; therefore, E[I,||4] = P[All¥] with probability 1. It 
is easily checked that, more generally, E[ X||.Z] = L,a;P[A,||Y] with proba- 
bility 1 for a simple function X = Y;a;7,. a 


In analogy with the case of conditional probability, if [X,, te T] is a 
collection of random variables, E[X||X,, te T] is by definition E[ X||F] 
with o[ X,, t € T] in the role of Y. 


Example 34.3. Let .7 be the o-field of sets invariant under a measure- 
preserving transformation T on (Q, F, P). For f integrable, the limit f in 
(24.7) is E[ f IIA ]: Since n is invariant, it is measurable Z. If G is invariant, 
then the averages a, in the proof of the ergodic theorem (p. 318) satisfy 
E[Iça,]= ELI f ]. But since the a, converge to f and are uniformly inte- 
grable, E[ Icf] = ELI f ]. a 


Properties of Conditional Expectation 
To prove the first result, apply Theorem 16.10(iii) to f and E[Xl|.4] on 
(EF P). 


Theorem 34.1. Let P be a mr-system generating the o-field ¥, and suppose 
that €) is a finite or countable union of sets in ¥. An integrable function f is a 
version of EL X ||) if it is measurable Y and if 


(34.3) J fap = J xar 


holds for all G in P. 


In most applications it is clear that Q € F. 


All the equalities and inequalities in the following theorem hold with 
probability 1. 


SECTION 34. CONDITIONAL EXPECTATION 447 
Theorem 34.2. Suppose that X, Y, X, are integrable. 


(i) If X =a with probability 1, then EL X ||Z] = a. 
(ii) For constants a and b, E[aX + bY||4] = aELX ||] + bE[Y ||.Z]. 
(iii) If X < Y with probability 1, then E[ XI|] < E[Y||F]. 
(iv) |ELX||F]] < EU XNA. 
(v) If lim, X, = X with probability 1, | X,| < Y, and Y is integrable, then 
lim, ELX,||4] = ELX||Y] with probability 1. 


PRoor. If X =a with probability 1, the function identically equal to a 
satisfies conditions (i) and (ii) in the definition of E[ X||-7], and so (i) above 
follows by uniqueness. 

As for Gi), aE[ X||-Z] + bE[Y||4] is integrable and measurable Y, and 


| (aE[ XII] + bE[Y||F]) dP =a | E[ X lle ] dP +b f E[Y\|Y]aP 
G G G 
=a J XdP +b J Yar = J (aX + bv) dP 


for all G in Y, so that this function satisfies the functional equation. 

If X «Y with probability 1, then /,(E[YI|¥]— ELXI-ZD dP = fY — 
X)dP > 0 for all G in Y. Since E[Y||¥] — E[X||¥] is measurable 2, it 
must be nonnegative with probability 1 (consider the set G where it is 
negative). This proves (iii), which clearly implies (iv) as well as the fact that 
E[ X||Y]=E[Y||F] if X = Y with probability 1. 

To prove (v), consider Z, = sup, .,| X, — X]. Now Z, 10 with probability 
1, and by (ii), (iii), and Gv), |ELX,||4] — ELXII-]| < E[Z,||F]. It suffices, 
therefore, to show that E[Z,||-Z] 1 0 with probability 1. By (iii) the sequence 
E(Z,,\|¥] is nonincreasing and hence has a limit Z; the problem is to prove 
that Z = 0 with probability 1, or, Z being nonnegative, that E[Z] = 0. But 
0<Z,<2Y, and so (34.1) and the dominated convergence theorem give 
E[Z]= fE(Z\|Y) dP < fE(Z,,||¥]dP = E[Z,] > 0. Lj 


The properties (33.21) through (33.28) can be derived anew from Theorem 
34.2. Part (ii) shows once again that E[X;o;1, || Z] = X,;a;P[.A,l-4] for sim- 
ple functions. 

If X is measurable .Z, then clearly E[X||4] = X with probability 1. The 
following generalization of this is used constantly. For an observer with the 
information in Y, X is effectively a constant if it is measurable #: 


Theorem 34.3. /f X is measurable £, and if Y and XY are integrable, then 
(34.4) E[ XY ||-2] = XE[Y ||-4 ] 
with probability 1. 


448 DERIVATIVES AND CONDITIONAL PROBABILITY 


PRoor. It will be shown first that the right side of (34.4) is a version of 
the left side if X = Ic, and G, € .4. Since Ic, E[Y ||] is certainly measur- 
able .Z it suffices to show that it sátislles the functional equation 
Jg Ig, ElY Il] aP = Jg Ig, Y 4P, G € Y. But this reduces to fong, EL Y l7] dP 

= fo oc, Y dP, which holds by the definition of ELY ||.7]. Thus (34.4) holds if 
X is the indicator of an element of 4. 

It follows by Theorem 34.2(ii) that (34.4) holds if X is a simple function 
measurable £. For the general X that is measurable .Z, there exist simple 
functions X,, measurable Y, such that | X,| « | X| and lim, X, =X (Theorem 
13.5). Since | X,Y| «| XY| and |XY| is integrable, Theorem 34.2(v) implies 
that lim, E[X,Y|l.7] = ELXY||-Z] with probability 1. But ELX,Y|-2] — 
X, E[Y||-4] by the case already treated, and of course lim, X, E[Y ||-7] = 
XE[Y |Z]. (Note that [X,E[Y||4]] ELX Y NS < EUX,YI1-4] < 
E[| XY |||], so that the limit XE[Y||4] is integrable.) Thus (34.4) holds in 
general. Notice that X has not been assumed integrable. a 


Taking a conditional expected value can be thought of as an averaging or 
smoothing operation. This leads one to expect that averaging X with respect 
to Z and then averaging the result with respect to a coarser (smaller) 
c-field Y, should lead to the same result as would averaging with respect to 
Y, in the first place: 


Theorem 34.4. If X is integrable and the o-fields Y and FY, satisfy 
GY, CF,, then 


(34.5) E[E[XI|A]IFA,] = E[ XNA] 


with probability 1. 


Proor. The left side of (34.5) is measurable Y,, and so to prove that it is 
a version of E[ X ||-2,], it is enough to verify fc ELELX IA 2, ]dP = [c XdP 
for GEJ- But if GEA, then G €.2,, and the left side here is 
fs ELXII- S] dP = fc XaP. " 


If .£, = F, then E[X|| 4] = X, so that (34.5) is trivial. If 4 = (0, Q} and 
G, = .£, then (34.5) becomes 


(34.6) E[E[ XII.Z]] = E[ X ], 


the special case of (34.1) for G = Q. 

If .Z C.£,, then E[XI|.7,], being measurable .£,, is also measurable E 
so that taking an expected value with respect to Y, does not alter it: 
ELELX I.Z)ll.Z]= E[ XII.Z]. Therefore, if 4, c A, «—— iterated expected 
values in either order gives E[X ||.£, ]. 


SECTION 34. CONDITIONAL EXPECTATION 449 


The remaining result of a general sort needed here is Jensen’s inequality 
for conditional expected values: If @ is a convex function on the line and X 
and @( X) are both integrable, then 


(34.7) e( E[ XIl#]) < E[eCX )ll V] 


with probability 1. For each x, take a support line [A33] through (xo, 9x9): 
eG) + A(x Xx — xo) € G(X). The slope A(x,) can be taken as the right-hand 
derivative of o, so that it is nondecreasing in xg. Now 


e(E[XIIZ]) +A(E[XIIZ])(X — EL XIIZ]) s eCX). 


Suppose that E[X||4] is bounded. Then all three terms here are integrable 
(if œ is convex on R!, then o and A are bounded on bounded sets), and 
taking expected values with respect to ¥ and using (34.4) on the middle term 
gives (34.7). 
To prove (34.7) in general, let G, = [|ELX1l-7]| x n]. Then EU XI] = 
I; E[X||¥] is bounded, and so (34. 7) holds for Iç X: (Ig, ELX | 7 < 
Ele (Ie XI] Now Eleg X)IZ]= Ells e CX) + loe OF] = 
ELX IZ] + Ig<p(0) > E[o(X)I|¥]. Since es, E[X||#]) converges to 
ar l4] by the continuity of o, (34.7) follows. If e(x) = |x|, (84.7) gives 
part (iv) of Theorem 34.2 again. 


Conditional Distributions and Expectations 


Theorem 34.5. Let uU, €) be a conditional distribution with respect to 3 
of a random variable X, in the sense of Theorem 33.3. If e: R! — R! is a Borel 
function for which p(X) is integrable, then [pip(x)u(dx,w) is a version of 
EAX WI] 


Proor. If e —1, and HE .2!, this is an immediate consequence of the 
definition of conditional distribution, and by Theorem 34.2(ii) it follows for œ 
a simple function over R!. For the general nonnegative e, choose simple o, 
such that 0 < e,(x)1 g(x) for each x in R'. By the case already treated, 
[p e, xx, w) is a version of E[y,(X)||¥],,. The integral converges by the 
monotone convergence theorem in (R!, !, u(-,w)) to fpip(x)u(dx, w) for 
each w, the value + not excluded, and E[e,CX)ll-4], converges to 
E[oCX )]-4],, with probability 1 by Theorem 34.2(v). Thus the result holds for 
nonnegative o, and the general case follows from splitting into positive and 
negative parts. | 


It is a consequence of the proof above that /pip(x)u(dx, w) is measurable 
f and finite with probability 1. If X is itself integrable, it follows by the 


450 DERIVATIVES AND CONDITIONAL PROBABILITY 


theorem for the case g(x) = x that 


E[XI.Z],= Í xu(dx, o) 
with probability 1. If e( X) is integrable as well, then 
(34.8) Ele XNA] = f p(x) u(dx, w) 


with probability 1. By Jensen’s inequality (21.14) for unconditional expected 
values, the right side of (34.8) is at least ¢( [7 „xu(dx, w)) if o is convex. This 
gives another proof of (34.7). 


Sufficient Subfields* 


Suppose that for each @ in an index set ©, P, is a probability measure on (0,5), la 
Statistics the problem is to draw inferences about the unknown parameter 0 from an 
observation w. 

Denote by P,[ All-Z] and E,[ X||4] conditional probabilities and expected values 
calculated with respect to the probability measure P, on (Q, Z). A o-field Y in F 
is sufficient for the family [P,: 0 € @] if versions P,[ All-Z] can be chosen that are 
independent of @—that is, if there exists a function p(A,w) of A € F and o € Q 
such that, for each A € F and 0 € 0, p( A, -) is a version of P,[ A||¥]. There is no 
requirement that p(-,w) be a measure for w fixed. The idea is that although there 
may be information in Z not already contained in .Z, this information is irrelevant to 
the drawing of inferences about 0.‘ A sufficient statistic is a random variable or 
random vector T such that a (T) is a sufficient subfield. 

A family æ of measures dominates another family -/ if, for each A, from 
(A) = 0 for all u in 4, it follows that v (LA) = 0 for all v in JJ. If each of .£ and M 
dominates the other, they are equivalent. For sets consisting of a single measure these 
are the concepts introduced in Section 32. 


Theorem 34.6. Suppose that | P,: 0 € @] is dominated by the o-finite measure u. A 
necessary and sufficient condition for FY to be sufficient is that the density f, of P, with 
respect to u can be put in the form f, = g,h for a gg measurable FY. 


It is assumed throughout that g, and h are nonnegative and of course that À is 
measurable F. Theorem 34.6 is called the factorization theorem, the condition being 
that the density f, splits into a factor depending on w only through . and a factor 
independent of 0. Although gç and h are not assumed integrable u, their product fy, 
as the density of a finite measure, must be. Before proceeding to the proof, consider 
an application. 


Example 34.4. Let (Q, 9) - (R*, 4^), and for 8» 0 let P, be the measure 
having with respect to k-dimensional Lebesgue measure the density 


0" if0<x;<0,i=1,...,k 
0 otherwise. 


, 


fo( x) = fg( 33,..., x4) = | 


*This topic may be omitted. 
'See Problem 34.19. 


SECTION 34. CONDITIONAL EXPECTATION 451 


If X, is the function on R* defined by X,(x) = x,, then under Pp, Xj,.-.,X, are 
independent random variables, each uniformly distributed over [0,6]. Let T(x)= 
max; < Xx). If g(t) is 07^ for 0<t<@ and 0 otherwise, and if h(x) is 1 or 0 
according as all x, are nonnegative or not, then f,(x) = g,(T(x))h(x). The factoriza- 
tion criterion is thus satisfied, and 7 is a sufficient statistic. 

Sufficiency is clear on intuitive grounds as well: 0 is not involved in the conditional 
distribution of X,,..., X, given T because, roughly speaking, a random one of them 
equals T and the others are independent and uniform over [0, T]. If this is true, the 
distribution of X, given T ought to have a mass of k ! at T and a uniform 
distribution of mass 1 — k ^! over [0, 7], so that 


1 k-1T k+1 
(34.9) E,| X;IIT ] = valer er D = k l 


For a proof of this fact, needed later, note that by (21.9) 
(34.10) f Xia, [PLT <t, X,2u] du 
T<t 0 
1: k-1 TII 
x f" u (s) p : 
0 € 0 20 


if 0 <t < 6. On the other hand, P,[T < t] = (1/0)*, so that under P, the distribution 
of T has density kt* ! /0* over [0,0]. Thus 


k+1 De | a m=: fle sol 
(34.11) [aE T4 aR | wk Ge du = TL E 
Since (34.10) and (34.11) agree, (34.9) follows by Theorem 34.1. m 


The essential ideas in the proof of Theorem 34.6 are most easily understood 
through a preliminary consideration of special cases. 


Lemma 1. Suppose that [P,: 0 € @] is dominated by a probability measure P and 
that each P, has with respect to P a density gą that is measurable 3. Then . is 
sufficient, and P{ A\|4] is a version of P A\|¥] for each 0 in @. 


Proor. For G in %, (34.4) gives 
P[A\|Y] dP, = | E[L,l.Z]g dP = | El igol | dP 
J PLAVz] ap, = f ELLA Jas dP = f EL gol s] 
=f laga dP= | s, dP- P(AnG). 
G ANG 


Therefore, P[ A|| ]—the conditional probability calculated with respect to P—does 
serve as a version of P,[ AllZ] for each 0 in ©. Thus # is sufficient for the family 


452 DERIVATIVES AND CONDITIONAL PROBABILITY 


[P,: 0 € @]—even for this family augmented by P (which might happen to lie in the 
family to start with). " 
For the necessity, suppose first that the family is dominated by one of its members. 


Lemma 2. Suppose that [ P4: 0 € @] is dominated by P, for some 0,€ 9. If F is 
sufficient, then each P, has with respect to Pa, a density gç that is measurable 2. 


Proor. Let p(A,w) be the function in the definition of sufficiency, and take 
PAIS], = pCA, œw) for all A € F, w E Q, and 0 € @. Let d, be any density of P, 
with respect to P, . By a number of applications of (34.4), 


J Es [ol ] dP,, = f 1, E, [ doll 4] dP, 
- f E, (LEs [d,ll-4 ]l-7) aP, = i E, (Lll) Es, [dull ] dPs, 
- J E, | Ee 141-4) doll Z| ap, = f E, { Lill Z) do dP, 
= J P,,[ AlZ#] aP, = [Pol Alle ] dP, = P,( A), 


the next-to-last equality by sufficiency (the integrand on either side being p (A, )). 
Thus g,=£, [d,ll-], which is measurable Y, can serve as a density for P, with 
respect to P, . a 


To complete the proof of Theorem 34.6 requires one more lemma of a technical 
sort. 


Lemma 3. If [P,: 0 € ©] is dominated by a a-finite measure, then it is equivalent to 
some finite or countably infinite subfamily. 


In many examples, the P, are all equivalent to each other, in which case the 
subfamily can be taken to consist of a single Pç. 


Proor. Since p is o-finite, there is a finite or countable partition of Q into 
Fsets A, such that 0 < u(A,„) < e. Choose positive constants a,, one for each Ap, 
in such a way that L,,a, < o. The finite measure with value Y,a,u(A n A,)/uCA,) 
at A dominates yw. In proving the lemma it is therefore no restriction to assume the 
family dominated by a finite measure p. 

Each P, is dominated by u and hence has a density f, with respect to it. Let 
So =[w: faw) > 0). Then P,( A) = P,CA n S,) for all A, and P,( A) = 0 if and only if 
ACA n $4) = 0. In particular, Sọ supports P}. 

Call a set B in F a kernel if B C S, for some 6, and call a finite or countable 
union of kernels a chain. Let a be the supremum of (C) over chains C. Since p is 
finite and a finite or countable union of chains is a chain, a is finite and u(C) = a for 
some chain C. Suppose that C — U, B,, where each B, is a kernel, and suppose that 
B C So- 

" The "problem is to show that [ Py: 0 € @] is dominated by [P, : n = 1,2,...] and 
hence equivalent to it. Suppose that P, (A) = 0 for all n. Then “u( A ñ S, ) = 0, as 
observed above. Since C C U, Sg, u(A MC) = 0, and it follows that P,( An C) = 0 


SECTION 34. CONDITIONAL EXPECTATION 453 


whatever 0 may be. But suppose that P,(A—C)>0. Then P(A — C)N S,) = 
P(A — C) is positive, and so (A — C) n S, is a kernel, disjoint from C, of positive 
p-measure; this is impossible because of the maximality of C. Thus P,(A = C) is 0 
along with P,( A N C), and so P,( A) = 0. a 


Suppose that [ P,: 0 € ©] is cominated by a o-finite 4, as in Theorem 34.6, so that, 
by Lemma 3, it contains a finite or infinite sequence P, , Pa... equivalent to the 
entire family. Fix one such sequence, and choose positive ‘constants c,, one for each 
6,. in such a way that L,,c, = 1. Now define a probability measure P on F by 


(34.12) P( A) = à; c, P, (A). 


Clearly, P is equivalent to [Ps P,,,.--] and hence to [P,: 0 € 9], and all three are 
dominated by u: y 


(34.13) P= P Pope d =[P 90] <p. 


PROOF OF SUFFICIENCY IN THEOREM 34.6. If each P, has density g,h with 
respect to u, then by the construction (34.12), P has density fh with respect to w, 
where f=2,,c,8. Put r= gə/f if f>0, and r,=0 (say) if f=0. If each g, is 
measurable .Z, the same is true of f and hence of the r,. Since P[ f = 0] = 0 and P is 
equivalent to the entire family, P,[ f = 0] = 0 for all 0. Therefore, 


J” gE J «fh fis yta m E Foes jid e 


= PAN [f> 0]) = (A). 


Each P, thus has with respect to the probability measure P a density measurable .£, 
and it follows by Lemma 1 that & is sufficient. m 


PRoor or NECESSITY IN THEOREM 34.6. Let p(A, w) be a function such that, for 
each A and 6, p(A,-) is a version of P,[ All-Z], as required by the definition of 
sufficiency. For P as in (34.12) and G € F, 


(34.14) J PC Aso) P(do) = Ec, f PCA, a) P, (de) 


= Len Í P,,[ Al#]aP,, 
n G 


= Yc, Py (An G)=P(AnG). 


Thus p(A, : ) serves as a version of P[ All£] as well, and # is still sufficient if P is 
added to the family. Since P dominates the augmented family, Lemma 2 implies that 
each P, has with respect to P a density gą that is measurable .Z But if h is the 
density of P with respect to u (see (34.13)), then P, has density gh with respect 
to u. a 


454 DERIVATIVES AND CONDITIONAL PROBABILITY 


A sub-e-field 4% sufficient with respect to [ Px: 0 € ©] is minimal if, for each 
sufficient .£, .4, is essentially contained in Y in the sense that for each A in A 
there is a B in .£ such that P,( A ^ B) = 0 for all 0 in ©. A sufficient 2 represents a 
compression of the information in Z, and a minimal sufficient 4 represents the 
greatest possible compression. 

Suppose the densities f, of the P, with respect to u have the property that f,(w) 
is measurable Zx F, where & is a o-field in ©. Let 7 be a probability measure on 
&. and define P as fa P,r(d0), in the sense that P(A) = fa fAfs(e)u(dow)m(d8) = 
le P,( A)m(d0). Obviously, P < [P,: 0 € ©]. Assume that 


(34.15) [P,: 0 e 0] «P= | P,=(d0). 


If m has mass c, at 0,, then P is given by (34.12), and of course, (35.15) holds if 
(34.13) does. Let r, be a density for P, with respect to P. 


Theorem 34.7. If (34.15) holds, then 4% = o[r,: 0 € ©] is a minimal sufficient 
sub-o-field. 


Proor. That 4 is sufficient follows by Theorem 34.6. Suppose that . is 
sufficient. It follows by a simple extension of (34.14) that # is still sufficient if 
P is added to the family, and then it follows by Lemma 2 that each P, has with 
respect to P a density gą that is measurable 2. Since densities are essentially unique, 
P[g, = r$] = 1. Let Z be the class of A in .% such that P(A ^ B) = 0 for some B in 
-£ Then Z is a o-field containing each set of the form A-[r, € H] (take 
B —[g, € H] and hence containing %. Since, by (34.15), P dominates each P}, A 
is essentially contained in .Z, in the sense of the definition. B 


Minimum-Variance Estimation” 


To illustrate sufficiency, let g be a real function on @, and consider the problem of 
estimating g(@). One possibility is that @ is a subset of the line and g is the identity; 
another is that @ is a subset of R“ and g picks out one of the coordinates. (This 
problem is considered from a slightly different point of view at the end of Section 19.) 
An estimate of g(@) is a random variable Z, and the estimate is unbiased if 
EZ] = 80) for all 0. One measure of the accuracy of the estimate Z is E,[(Z — 
g(0))*]. 

If F is sufficient, it follows by linearity (Theorem 34.2(ii)) that E [XII] has for 
simple X a version that is independent of 0. Since there are simple X, such that 
|\X,|<|X| and X, > X, the same is true of any X that is integrable with respect to 
each P, (use Theorem 34.2(v)). Suppose that 2 is, in fact, sufficient, and denote by 
EL X|] a version of Ej[ X||4] that is independent of 6. 


oi Theorem 34.8. Suppose that EJ(Z — g(0))2] < œ for all 0 and that F is sufficient. 
en 


(34.16) E| CELZI] - g(0)) | < E,|(Z - &(0)y] 


for all 0. If Z is unbiased, then so is E{Z\|4]. 


*This topic may be omitted. 


SECTION 34. CONDITIONAL EXPECTATION 455 


Proor. By Jensens’s inequality (34.7) for g(x) = (x — g(0))2, (E[ZI|F] — g(0))2 
< E,[(Z — g(6))*\|¥]. Applying E, to each side gives (34.16). The second statement 
follows from the fact that E,[ E[Z||.Z]] = E,[Z ]. a 


This, the Rao—Blackwell theorem, says that E[Z||4] is at least as good an estimate 
as Z if is sufficient. 


Example 34.5. Returning to Example 34.4, note that each X, has mean 6/2 
under P}, so that if X=k~'L*_,X, is the sample mean, then 2X is an unbiased 
estimate of 0. But there is a better one. By (34.9), E,[2X||T] = (k + 1)T/k =T’, and 
by the Rao- Blackwell theorem, T’ is an unbiased estimate with variance at most that 
of 2 X. 

In fact, for an arbitrary unbiased estimate Z, E,[(T’ — 0)!] < EJCZ — 0)?]. To 
prove this, let ó = T' — E[ZIT]. By Theorem 20.1(ii), ó —f(T) for some Borel 
function f, and E,[ f(T)] = 0 for all 0. Taking account of the density for T leads to 
[0f(x)x“ ! dx 20, so that f(x)x*-! integrates to 0 over all intervals. Therefore, 
f(x) along with f(x)x^ ! vanishes for x > 0, except on a set of Lebesgue measure 0, 
and hence P4[f(T) — 0] = 1 and P4T' = E[ZIIT]] = 1 for all 6. Therefore, E,[(T’ — 
8)] = E,[( E[ZIIT] — 0)2] < EJKZ — 0)2] for Z unbiased, and T” has minimum vari- 
ance among all unbiased estimates of 0. a 


PROBLEMS 


34.1. Work out for conditional expected values the analogues of Problems 33.4, 33.5, 
and 33.9. 


34.2. In the context of Examples 33.5 and 33.12, show that the conditional expected 
value of Y (if it is integrable) given X is g(X), where 


f f(x, y)ydy 
g(x) = —— —— —. 
J flare 


34.3. Show that the independence of X and Y implies that E[Y || X] = E[Y ], which 
in turn implies that E[ XY] = E[ X ]E[Y ]. Show by examples in an €) of three 
points that the reverse implications are both false. 


34.4. (a) Let B be an event with P(B) » 0, and define a probability measure P, by 
P(A) = P(A|B). Show that PAI] = PLA n BII-£]/PEBII-4] on a set of 
Po-measure 1. 

(b) Suppose that Z is generated by a partition B}, B;,..., and let JV = 
v0 (ZÜ 2^). Show that with probability 1, 


P[ANBIF] 


PLAISY HI = LI prz] 


456 


34.5. 


34.10. 


34.11. 


34.12. 


34.13. 


34.14. 


DERIVATIVES AND CONDITIONAL PROBABILITY 


The equation (34.5) was proved by showing that the left side is a version of the 
right side. Prove it by showing that the right side is a version of the left side. 


. Prove for bounded X and Y that E[YE[ X||.Z]] = EL XE[Y ||. ]]. 
. 3391 Generalize Theorem 34.5 by replacing X with a random vector. 


. Assume that X is nonnegative but not necessarily integrable. Show that it is 


still possible to define a nonnegative random variable E[ X||4], measurable F, 
such that (34.1) holds. Prove versions of the monotone convergence theorem 
and Fatou’s lemma. 


. (a) Show for nonnegative X that E[X||.7] = f5PLX > tll¥] dt with probabil- 


ity 1. 

(b) Generalize Markov’s inequality: Pl X| > ol] < = “kE[| X| I] with 
probability 1. 

(c) Similarly generalize Chevyshev’s and Hólder's inequalities. 


(a) Show that, if 4, CY, and E[X?] « o, then EIX — E[X||¥,))7] < E(X 
— E[ X||-Z ))?]. The dispersion of X about its conditional mean becomes 
smaller as the o-field grows. 

(b) Define Var[X||4] = E(X — E[X||¥])7||F]. Prove that Var[X] = 
E[Var[ X||4]] + Varl ELX||]]. 


Let F,,F,,9, be o-fields in F, let E be the o-field generated by .Z U <, 
and let A; be the generic set in Y. Consider three conditions: 
(i) PLAIA] = PILAA] for all A. 
(ii) PLA; ^ A315] = PLANAI] for all A, and As. 
(iii) PLAIA] = PLA lA] for all A,. 

If Y,, A, and .Z are interpreted as descriptions of the past, present, and 
future, respectively, (i) is a general version of the Markov property: the 
conditional probability of a future event A, given the past and present .4,, is 
the same as the conditional probability given the present Y, alone. Condition 
(iii) is the same with time reversed. And (ii) says that past and future events A, 
and A, are conditionally independent given the present .2,. Prove the three 
conditions equivalent. 


33.7 34.117 Use Example 33.10 to calculate P[ N, =kl|N,, u > t] (s < t) for 


the Poisson process. 


4? 


Let L? be the Hilbert space of square-integrable random variables on 
(Q, F, P). For Y a o-field in F, let M ç; be the subspace of elements of L? 
that are measurable .2. Show that the operator P, defined for X € L? by 
P ,X = E[ XIIZ] is the perpendicular projection on M s. 


T Suppose in Problem 34.13 that J= a(Z) for a random variable Z in L2. 
Let Sz be the one-dimensional subspace spanned by Z. Show that $ z may be 
much smaller than M,7,, so that E[ XIIZ] (for X € L?) is by no means the 
projection of X on Z. Hint: Take Z the identity function on the unit interval 
with Lebesgue measure. 


SECTION 34. CONDITIONAL EXPECTATION 457 


34.15. 


34.16. 


34.17. 


34.18. 


^ Problem 34.13 can be turned around to give an alternative approach to 
conditional probability and expected value. For a o-field Z in F, let Py be 
the perpendicular projection on the subspace My. Show that P ,X has for 
XEL? the two properties required of E[X||4]. Use this to define E[ X ||:Z] 
for X € L? and then extend it to all integrable X via approximation by random 
variables in L?. Now define conditional probability. 


Mixing sequences. A sequence A,, A>,... of Asets in a probability space 
(Q, F, P) is mixing with constant a if 


(34.17) lim P(A, NE) =a P( E) 


for every E in F. Then a = lim, P( A, ). 
(a) Show that (A,) is mixing with constant a if and only if 


(34.18) lim É: XdP =a | XdP 


for each integrable X (measurable F ). 

(b) Suppose that (34.17) holds for E € F, where # is a m-system, QE GF, 
and A, €o0(P) for all n. Show that (A,) is mixing. Hint: First check (34.18) 
for X measurable o(#) and then use conditional expected values with respect 
to e (4). 

(c) Show that, if P) is a probability measure on (Q, 4) and Py < P, then 
mixing is preserved if P is replaced by Po. 


t Application of mixing to the central limit theorem. Let X,, X5,... be 
random variables on (Q, F, P), independent and identically distributed with 
mean 0 and variance o’, and put $, — X, :: ° +X,. Then S,/aYn = N by 
the Lindeberg- Lévy theorem. Show by the steps below that this still holds if P 
is replaced by any probability measure P, on (Q, F) that P dominates. For 
example, the central limit theorem applies to the sums yr s (e) of 
Rademacher functions if w is chosen according to the uniform density over the 
unit interval, and this result shows that the same is true if w is chosen 
according to an arbitrary density. 

Let Y,=S,/ovn and Z, = (S, — Sito np/evn , and take # to consist of 
the sets of the form [(X,,..., X) € H], k > 1, HE 2*. Prove successively: 
(a) PIY, x] > PIN sx], 

(b) PIY,- Z,|> e] ^ 0. 

(c) P[Z, zx] — PIN xx]. 

(d) P(E rn [Z, xx) > P(E)P[N xx] for E € %. 
(e) P(En[Z, xp > P(E)P[N xx] for E € F. 
(D PIZ, <x) > PIN sx], 

(g) PIY, - Z,|2 e) > 0, 

(h) PY, <x) 9 PIN €x]. 


Suppose that .£ is a sufficient subfield for the family of probability measures 
P,, 0 € O, on (Q, F ). Suppose that for each 0 and A, p( A, w) is a version of 
P| All4],,, and suppose further that for each w, p(-,w) is a probability 


458 


DERIVATIVES AND CONDITIONAL PROBABILITY 


ey on F. Define Q, on F by Q,( A) = fo pCA, @)P,(dw), and show that 
0 — Pg. 

The idea is that an observer with the information in . (but ignorant of w 
itself) in principle knows the values p( A, w) because each p( A, : ) is measur- 
able Y. If he has the appropriate randomization device, he can draw an w 
from © according to the probability measure p(', w), and his w will have the 
same distribution Q, — P, that « has. Thus, whatever the value of the 
unknown 6, the observer can on the basis of the information in Y alone, and 
without knowing @ itself, construct a probabilistic replica of w. 


34.19. 34.137 In the context of the discussion on p. 252, let F be the o-field of sets 
of the form © X A for A € F. Show that under the probability measure Q, to 
is the conditional expected value of g, given F. 


34.20. (a) In Example 34.4, take 7 to have density e ^ over 9 = (0,%). Show by 
Theorem 34.7 that T is a minimal sufficient statistic (in the sense that o(T) is 
minimal). 


(b) Let P, be the distribution for samples of size n from a normal distribution 
with parameter 0 = (m,a?), o? > 0, and let = put unit mass at (0, 1). Show 
that the sample mean and variance form a minimal sufficient statistic. 


SECTION 35. MARTINGALES 


Definition 


Let X,,X,,... be a sequence of random variables on a probability space 
(Q, F, P), and let F,, .Z,,... be a sequence of o-fields in F. The sequence 
{((X,, Z): n — 1,2,...) is a martingale if these four conditions hold: 


@ F.C 4, 

(ii) X, is measurable Z; 
(iii) E[|X,,|] « =; 
(iv) with probability 1, 


(35.1) F X. |o | = Xn 


Alternatively, the sequence X,, X,,... is said to be a martingale relative to 
the o-fields F,, F,,.... Condition (i) is expressed by saying the Z, form a 
filtration and condition (ii) by saying the X,, are adapted to the filtration. 

If X, represents the fortune of a gambler after the nth play and Z, 
represents his information about the game at that time, (35.1) says that his 
expected fortune after the next play is the same as his present fortune. Thus 
a martingale represents a fair game, and sums of independent random 
variables with mean 0 give one example. As will be seen below, martingales 
arise in very diverse connections. 


SECTION 35. MARTINGALES 459 


The sequence X,, X,,... is defined to be a martingale if it is a martingale 
relative to some sequence ¥,,.F5,.... In this case, the o-fields — = 
o(X,,..., X,) always work: Obviously, 4 CY ,, and X, is measurable Z, 
and if (35.1) holds, then ELX, , 13] = E[E[ X, ,,12,]1-4,] = ELX,I,] = 
X, by (34.5). For these special o-fields G, (35.1) reduces to 


(35.2) E Xgl qon, X, | Xy 

Since c(X,,..., X,) C Z, if and only if X, is measurable Z, for each n, 
the o(X,,..., X,,) are the smallest o-fields with respect to which the X, are a 
martingale. 


The essential condition is embodied in (35.1) and in its specialization 
(35.2). Condition (iii) is of course needed to ensure that E[ X, , ,||.Z. ] exists. 
Condition (iv) says that X, is a version of E[X,,,lZ.] since X, is 
measurable Z, the requirement reduces to 


(35.3) J x. dP = | X, dP, Ae. 


Since the Z; are nested, A € .Z, implies that [,X,dP=[,X,,,dP= 
o0 = f1X,,,dP. Therefore, X,» being measurable 4%, is a version of 
EL X, y Fake 


(35.4) E[Xnsll% ] = x, 
with probability 1 for k > 1. Note that for A = Q, (35.3) gives 
(35.5) E[X,] =E[X,]= -::. 


The defining conditions for a martingale can also be given in terms of the 
differences 


(35.6) À. Six Paya, 


(A, = X,). By linearity, (35.1) is the same thing as 


(35.7) E[A, lA ] = 0. 

Note that, since X, = A, ::: +A, and A, =X, — X, ,, the sets X,,..., X, 
and A,,...,A, generate the same o-field: 

(35.8) FCX tme Aye Any 


Example 35.1. Let A,,A,,... be independent, integrable random vari- 
ables such that E[A,]=0 for n > 2. If f, is the o-field (35.8), then by 
independence E[A,,,;||_F%,] = E[A, 41] = 0. If A is another random variable, 


460 DERIVATIVES AND CONDITIONAL PROBABILITY 


independent of the A,, and if Z, is replaced by o(A, A,,...,A,,), then the 


X, — N + ::: +A, are still a martingale relative to the Z. It is natural and 
convenient in the theory to allow o-fields Y, larger than the minimal ones 
(35.8). " 


Example 35.2. Let (Q,.*, P) be a probability space, let v be a finite 
measure on F, and let .74,,.25,... be a nondecreasing sequence of o-fields 
in F. Suppose that P dominates v when both are restricted to .Z, —that is, 
suppose that A € Z, and P(A)= 0 together imply that v(A) = 0. There is 
then a density or Radon-Nikodym derivative X, of v with respect to P 
when both are restricted to Z; X, is a function that is measurable Y, and 
integrable with respect to P, and it satisfies 


(35.9) J X. dP=v(A), AEF. 


If 4A € F, then A € Z, as well, so that {,X,,,dP = v(A); this and (35.9) 
give (35.3). Thus the X, are a martingale with respect to the .Z.. = 


Example 35.3. For a specialization of the preceding example, let P be 
Lebesgue measure on the o-field Z of Borel subsets of Q = (0, 1], and let Z 
be the finite o-field generated by the partition of Q into dyadic intervals 
(k2-" (k - 27", 0<k <2". If AE % and P(A)=0, then A is empty. 
Hence P dominates every finite measure v on Z. The Radon—Nikodym 
derivative IS 


uk x ent) 


(35.10) X,(w) = = 


if o €(k2-",(k +1)2~"]. 


There is no need here to assume that P dominates v when they are 
viewed as measures on all of Z. Suppose that v is the distribution of 
no 54 2862 for independent Z, assuming values 1 and 0 with probabilities p 
and 1 — p. This is the measure in Examples 31.1 and 31.3 (there denoted by 
uw), and for p + 5, v is singular with respect to Lebesgue measure P. It is 
nonetheless absolutely continuous with respect to P when both are restricted 
to Z, m 


Example 35.4. For another specialization of Example 35.2, suppose that v 
is a probability measure Q on F and that Z = a(Y,,..., Y.) for random 
variables Y,,Y,,... on (Q, F). Suppose that under the measure P the 
distribution of the random vector (Y,,..., Y,) has density p,(y,,..., y,) with 
respect to n-dimensional Lebesgue measure and that under Q it has density 
q y, Yn). To avoid technicalities, assume that p, is everywhere positive. 


SECTION 35. MARTINGALES 461 
Then the Radon-Nikodym derivative for Q with respect to P on .Z, is 


v . U TI ia tytn) 


35.11 X, = j 
Ed BA ee Y) 
To see this, note that the general element of % is [(Y,,..., Y,) € H| 
H € 3": by the change-of-variable formula, 
Onl Yi Yn) 
A ar = aie. Nee d AEB 
ls. SN Y, )e€H] H D pos) ( i ) i: Z 


= Qi... Y) en]. 


In statistical terms, (35.11) is a likelihood ratio: p, and q, are rival 
densities, and the larger X, is, the more strongly one prefers q, as an 
explanation of the observation (Y,,..., Y,,). The analysis is carried out under 
the assumption that P is the measure actually governing the Y,; that is, X, is 
a martingale under P and not in general under Q. 

In the most common case the Y, are independent and identically dis- 
tributed under both P and Q: p,(y,,...,y,) ^ p(yp--- pCy,) and 
QC Y 4... Yn) =q(y,):: qCy,) for densities p and q on the line, where p is 
assumed everywhere positive for simplicity. Suppose that the measures corre- 
sponding to the densities p and q are not identical, so that P[Y, € H] + 
OLY, € H] for some H € AZ. If Z, = ly e up then by the strong law of large 
numbers, n 'DZ_,Z, converges to P[Y, € H] on a set (in .7) of P-measure 
] and to Q[Y, € H] on a (disjoint) set of Q-measure 1. Thus P and Q are 
mutually singular on F even though P dominates Q on Z. = 


Example 35.5. Suppose that Z is an integrable random variable on 
(O, .Z, P) and that Z, are nondecreasing o-fields in F. If 


(35.12) Xc BAZA 


then the first three conditions in the martingale definition are satisfied, and 
by (34.5), E[ X, + IZAIA E[E[Z||# AMANE EZA Thus X, is a 
martingale relative to .Z. = 


Example 35.6. Let N,,, n, k =1,2,..., be an independent array of iden- 
tically distributed random variables assuming the values 0,1,2,.... Define . 
Zy,Z,,Z,,... inductively by Z (w)=1 and Z,(w) = N, (w) 
tcc +N, Z, (uy); Z,(w) = 0 if Z, (wo) =0. If N,, is thought of as the 
number of progeny of an organism, and if Z,_, represents the size at time 
n — 1 of a population of these organisms, then Z, represents the size at time 
n. If the expected number of progeny is E[N,,]=m, then ELZ IZ, ,]— 
Z,_ im, so that X, — Z,/m", n=0,1,2,..., is a martingale. The sequence 
Zo, Z,,... is a branching process. E 


462 DERIVATIVES AND CONDITIONAL PROBABILITY 


In the preceding definition and examples, n ranges over the positive 
integers. The definition makes sense if n ranges over 1,2,..., N; here 
conditions (ii) and (iii) are required for 1 < n < N and conditions (i) and (iv) 
only for 1 <n < N. It is, in fact, clear that the definition makes sense if the 
indices range over an arbitrary ordered set. Although martingale theory with 
an interval of the line as the index set is of great interest and importance, 
here the Index set will be discrete. 


Submartingales 


Random variables X, are a submartingale relative to o-fields Z, if (i), (ii), 
and (iii) of the definition above hold and if this condition holds in place of 


(iv): 
(iv') with probability 1, 
(35.13) EX, e. | =X, 


As before, the X, are a submartingale if they are a submartingale with 
respect to some sequence Z, and the special sequence Z, = c CX,, 7X9 
works if any does. The requirement (35.13) is the same thing as 


(35.14) jx. ae J X. dP, AEF. 
A A 


This extends inductively (see the argument for (35.4)), and so 
(35.15) PG All eae 

for k > 1. Taking expected values in (35.15) gives 

(35.16) E[|X ] sB[X.]= ``: 


Example 35.7. Suppose that the A, are independent and integrable, as in 
Example 35.1, but assume that E[A,] is for n > 2 nonnegative rather than 0. 
Then the partial sums A, + ::: +A, form a submartingale. LI 


Example 35.8. Suppose that the X, are a martingale relative to the Z. 
Then |X,| is measurable .Z and integrable, and by Theorem 34.X(iv), 
E[| X, , MA] z LEEX, , MA] — LX, |. Thus the | X,| are a submartingale rela- 
tive to the .Z,. Note that even if X,,..., X, generate Z, in general 
|X,|,..-,|X,,| will generate a o-field smaller than .%. = 


Reversing the inequality in (35.13) gives the definition of a supermartin- 
gale. The inequalities in (35.14), (35.15), and (35.16) become reversed as well. 
The theory for supermartingales is of course symmetric to that of submartin- 
gales. 


SECTION 35. MARTINGALES 463 


Gambling 


Consider again the gambler whose fortune after the nth play is X, and 
whose information about the game at that time is represented by the o-field 
F. If F=a(X,...,X,), he knows the sequence of his fortunes and 
nothing else, but Z, could be larger. The martingale condition (35.1) 
stipulates that his expected or average fortune after the next play equals his 
present fortune, and so the martingale is the model for a fair game. Since the 
condition (35.13) for a submartingale stipulates that he stands to gain (or at 
least not lose) on the average, a submartingale represents a game favorable 
to the gambler. Similarly, a supermartingale represents a game unfavorable 
to the gambler.' 

Examples of such games were studied in Section 7, and some of the results 
there have immediate generalizations. Start the martingale at n=0, Xo 
representing the gambler’s initial fortune. The difference A, =X, — X, , 
represents the amount the gambler wins on the nth play,* a negative win 
being of course a loss. Suppose instead that A, represents the amount he 
wins if he puts up unit stakes. If instead of unit stakes he wagers the amount 
W, on the nth play, W,A, represents his gain on that play. Suppose that 
W, > 0, and that W, is measurable .Z, , to exclude prevision: Before the 
nth play the information available to the gambler is that in Z ,, and his 
choice of stake W, must be based on this alone. For simplicity take W, 
bounded. Then W,A,, is integrable, and it is measurable Z, if A, is, and if 
X, is a martingale, then E[W,A,||F,_,] = W,E[A, || F%,_,] = 0 by (34.2). Thus 


n 


(35.17) X +W Art +W,À, 


is a martingale relative to the A. The sequence W,,W,,... represents a 
betting system, and transforming a fair game by a betting system preserves 
fairness; that is, transforming X, into (35.17) preserves the martingale 
property. 

The various betting systems discussed in Section 7 give rise to various 
martingales, and these martingales are not in general sums of independent 
random variables—are not in general the special martingales of Example 
35.1. If W, assumes only the values 0 and 1, the betting system is a selection 
system; see Section 7. 

If the game is unfavorable to the gambler—that is, if X, is a supermartin- 
gale—and if W, is nonnegative, bounded, and measurable Z _,, then the 
same argument shows that (35.17) is again a supermartingale, is again 
unfavorable. Betting systems are thus of no avail in unfavorable games. 

The stopping-time arguments of Section 7 also extend. Suppose that {X,} 
is a martingale relative to (.Z); it may have come from another martingale 


"There is a reversal of terminology here: a subfair game (Section 7) is against the gambler, while 


a submartingale favors him. 
*The notation has, of course, changed. The F, and X, of Section 7 have become X, and A; 


464 DERIVATIVES AND CONDITIONAL PROBABILITY 


via transformation by a betting system. Let + be a random variable taking on 
nonnegative integers as values, and suppose that 


(35.18) [ren]e F. 


If 7 is the time the gambler stops, [r = n] is the event he stops just after the 
nth play, and (35.18) requires that his decision is to depend only on the 
information Z, available to him at that time. His fortune at time n for this 
stopping rule is 


(35.19) ys 4 sn 
s: Ar MET: 


Here X, (which has value X,,,,,(w) at w) is the gambler's ultimate fortune, 
and it is his fortune for all times subsequent to 7. 

The problem is to show that Xf, X#,... is a martingale relative to 
So S qs First 


Y. E|] — 


n—1 
E[|x*l] = PWs  |X,laP< 
| | > does È it = 


Since [r >n] 2 O - [r xn] e Z, 


[X*eH]- Ue =k, X, e H| odo nm X en 
k=0 


Moreover, 


and 


[XtedP= f Xavi dP + | X, dP. 


A[r<n) 


Because of (35.3), the right sides here coincide if A € Z; this establishes 
(35.3) for the sequence X, X7,..., which is thus a martingale. The same 
kind of argument works for supermartingales. 

Since X* = X, for n > r, X*  X,. As pointed out in Section 7, it is not 
always possible to integrate to the limit here. Let ¥,=a+A,+--- +A,, 
where the A, are independent and assume the values +1 with probability 5 
(X, — a), and let 7 be the smallest n for which A, + ::: +A, = 1. Then 
E[X*]=a and X,=a+1. On the other hand, if the X, are uniformly 
bounded or uniformly integrable, it is possible to integrate to the limit: 
ELX,] = ELX;l. 


SECTION 35. MARTINGALES 465 


Functions of Martingales 


Convex functions of martingales are submartingales: 


Theorem 35.1. (i) If X,, X,... is a martingale relative to Fi, Z,,..., if 
e is convex, and if the g(X,,) are integrable, then p(X,),p(X,),... is a 
submartingale relative to Fi, F3. 

(ii) If Xi, X5,... is a submartingale relative to Fi, F,,..., if @ is nonde- 
creasing and convex, and if the q( X,) are integrable, then e X,), p(X,),... is 
a submartingale relative to Fi, F,,.... 


PRoor. In the submartingale case, X, < E[ X, lZ], and if is non- 
decreasing, then 9X, ) < eCELX, , FD. In the martingale case, X, = 
ELX, , il, ], and so e(X,) = eCELX, Il ZD. If e is convex, then by fensen' S 
inequality (34.7) for conditional expectations, it follows that eCE[ X, , ,||Z ]) < 
E[oCX, , MZ m 


Example 35.8 is the case of part (i) for p(x) = |xl. 


Stopping Times 


Let + be a random variable taking as values positive integers or the special 
value e. It is a stopping time with respect to (Z) if [r — k] € FY, for each 
finite k (see (35.18)), or, canivalently, if [r<k]e@ % for each finite R 
Define 


(35.20) F=|AEF: AN[tSkJE Z,,1«k«e]. 


This is a o-field, and the definition is unchanged if [7 < k] is replaced by 
[7 =k] on the right. Since clearly [r 2j] € Æ for finite j, + is measurable 
Z. 

If r(w) < for all w and Z =o(X,,..., X,), then 1,(@) = L(@”') for all 
A in Z if and only if X(w) = : X (a!) for i € r(@) = r(w'): The information 
in Z consists of the values r(«), X,(w),..., X (e). 

Suppose now that 7, and +, are two stopping times and 1, <7,. If 
Ae F, then An[r, xk]e F, and hence An[r; < k] = =AN(t,<k]n 
Peg SE m 


Theorem 35.2. If X,,..., X, is a submartingale with respect to 955. m 
and 7,,7Tj are stopping fires satisfying 1 <r, < r; < n, then X, r £ 
submartingale with respect to % F, 


466 DERIVATIVES AND CONDITIONAL PROBABILITY 


This is the optional sampling theorem. The proof will show that X, , X,, is 
a martingale if X,,..., X, is. 


Proor. Since the X, are dominated by L;_,|X;|, they are integrable. It 
is required to show that ELX, IA] >X, or 


(35.21) J (XXn) dP > 0, AEF. 


But.A € F, implies that A N[7, < k <+>,] =(A [vr <k - ID N[7, sk - 1F 
lies in Y,_,. If A, =X, — X, ,, then 


A, dP 


J (Xa - X, )dP = ih 


le, «kzr;] 


iMa 


A, dP2@ 


A[r,<k <r] 


ll 
mM = 
Nee 


by the submartingale property. = 


Inequalities 


There are two inequalities that are fundamental to the theory of martingales. 


Theorem 35.3. /f X,,..., X, is a submartingale, then for a > 0, 
1 
(35.22) P| max X, > a| s —P|[[x |). 
in 


This extends Kolmogorov's inequality: If S, $5,... are partial sums of 
independent random variables with mean 0, they form a martingale; if the 
variances are finite, then 52, $2,... is a submartingale by Theorem 35.1(i), 
and (35.22) for this submartingale is exactly Kolmogorov's inequality (22.9). 


Proor. Let r,=n; let 7, be the smallest k such that X, > a, if there is 
one, and n otherwise. If M, — max;,, X; then [M,za]n[rz, xk] 
[M, >a] € Fp, and hence [ M, > a] is in Æ. By Theorem 35.2, 


(3523)  aP[M,zo]x [ X X,dP« X, dP 


VE CMM U a 


< X; dP < E[ X7] <E[IX,]]. = 
] 


[M, >a 


This can also be proved by imitating the argument for Kolmogorov’s 
inequality in Section 23. For improvements to (35.22), use the other integrals 


SECTION 35. MARTINGALES 467 


in (35.23). If X,,..., X, is a martingale, |X |,...,|X,| is a submartingale, and 
so (35.22) gives P[max; _,|X,| > a] < a ` 'E[| X, |]. 

The second fundamental inequality requires the notion of an upcrossing. 
Let [a, 8] be an interval (a < 0) and let X,,..., X, be random variables. 
Inductively define variables 7,,75,...: 


7, is the smallest j such that 1 <j € n and X; <a, and is n if there is no 
such J; 

7, for even k is the smallest j such that 7, , <j € n and X; > B, and is n if 

there is no such J; 

for odd k exceeding 1 is the smallest j such that 7, ., <j <n and X; <a, 

and is n if there is no such J. 


The number U of upcrossings of [o, 8] by X,,..., X, is the largest i such 
that X. a «BxX,.. In the diagram, n = 20 and there are three up- 
crossings. 


A LA A wn 
UUM AE 
AAT NN 


Q 


Ti T2 T3 T4 Ts T6 Tyee ae 


Theorem 35.4. For a submartingale X,,..., X,, the number U of upcross- 
ings of [a, B] satisfies 
E[|X,|] + lal 

e =a a 
Proor. Let Y, = max(0, X, —a) and 0 =8 —a. By Theorem 35.1Gi), 
Y,...,Y, is a submartingale. The 7, are unchanged if in the definitions 
X; <a is replaced by Y; 20 and X;2> B by Y; 20, and so U is also the 
number of upcrossings of [0,0] by Y,,...,Y,. If k is even and T. NN 
stopping time, then for j < n, 


j-1 
[map] 4 [4-19 5 Y, €«0,...,Y; , «6, Y;2 6| 


i=1 


468 DERIVATIVES AND CONDITIONAL PROBABILITY 


lies in Z and [r, =n] = [r, € n — 1] lies in Z, and so 7, is also a stopping 
time. With a similar argument for odd k, this shows that the T, are all 
stopping times. Since the 7, are strictly increasing until they reach n, r, = n. 
Therefore, 


n 
y *hzbh-Hhn* BOY Tk- Jt >, +>, 


where >, and X, are the sums over the even k and the odd k in the range 
2<k <n. By Theorem 35.2, >, has nonnegative expected value, and there- 
fore, E[Y,] > E[X,]. 

If Y. —0«8xY,, (which is the same thing as X,,__, < & < B < X. ), 


72i 
then the difference Mm -Y ;,,., appears in the sum >. and i is at least 0. Since 


there are U of these differences, >, > OU, and therefoue E[Y,] > @E[U ]. In 
terms of the original variables, this is 


(8-a)EQU]s f. (X, =a) dP < E[IX,I] + lal = 


n 


In a sense, an upcrossing of [a, B] is easy: since the X, form a submartin- 
gale, they tend to increase. But before another upcrossing can occur, the 
sequence must make its way back down below a, which it resists. Think of 
the extreme case where the X, are strictly increasing constants. This is 
reflected in the proof. Each of X, and X, has nonnegative expected value, 
but for €, the proof uses the stronger inequality E[> ] > E[8U ]. 


Convergence Theorems 


The martingale convergence theorem, due to Doob, has a number of forms. 
The simplest one is this: 


Theorem 35.5. Let X,, X,,... bea submartingale. If K = sup, E(|X,|] < ©, 
then X, — X with probability 1, where X is a random variable satisfying 
E[I X|] < K. 


Proor. Fix a and ñ for the moment, and let U, be the number of 
upcrossings of [a, B] by X,,..., X,. By the upcrossing theorem, E[U,] < 
CEULX, I +laD/(B — a) x CK - aD/CB — a). Since U, is nondecreasing and 
E[U,] is bounded, it follows by the monotone convergence theorem that 
sup, U, is integrable and hence finite-valued almost everywhere. 

Let X* and X, be the limits superior and inferior of the sequence 
X,, X5,...; they may be infinite. If X, «a < B < X*, then U, must go to 
infinity. Since U, is bounded with probability 1, P[X, «a < 8 < X*]= 0 


SECTION 35. MARTINGALES 469 


Now 
(35.25) [X. <X*]= [Xs << Sx 


where the union extends over all pairs of rationals a and f. The set on the 
left therefore has probability 0. 

Thus X* and X, are equal with probability 1, and X, converges to their 
common value X, which may be +. By Fatou’s lemma, &[|X|] < 
lim inf, E[| X, || < K. Since it is integrable, X is finite with probability 1. — 8 


If the X, form a martingale, then by (35.16) applied to the submartingale 
IX |. 1X5)... the E[|X,|] are nondecreasing, so that K = lim, E[|X,]]. The 
hypothesis in the theorem that K be finite is essential: If X, =A, + --- +4,, 
where the A, are independent and assume values + 1 with probability 5, 
then X, does not converge; E[|X,,|] goes to infinity in this case. 

If the X, form a nonnegative martingale, then E[| X,|] = ELX,] = E[ X | 
by (35.5), and K is necessarily finite. 


-3 


Example 35.9. The X, in Example 35.6 are nonnegative, and so X, — 
Z,/m" X, where X is nonnegative and integrable. If m < 1, then, since Z, 
is an integer, Z, = 0 for large n, and the population dies out. In this case, 
X = 0 with probability 1. Since E[ X,] = El Xo] = 1, this shows that E[ X,] ^ 
El X] may fail in Theorem 35.5. & 


Theorem 35.5 has an important application to the martingale of Example 
35.5, and this requires a lemma. 


Lemma. If Z is integrable and Z, are arbitrary o-fields, then the random 
variables E[ Z ||.Z ] are uniformly integrable. 


For the definition of uniform integrability, see (16.21). The .Z, must, of 
course, lie in the o-field F, but they need not, for example, be nondecreas- 
ing. 


PROOF OF THE LEMMA. Since | E[ZII.Z,] < E[|Z| lZ], Z may be assumed 
nonnegative. Let A,, = [E[Z||Z ] =a]. Since Aan € F 


" E| ZIZ, ]aP- f ZdP. 


an 


It is therefore enough to find, for given e, an «a such that this last integral is 
less than e for all n. Now {,ZdP is, as a function of A, a finite measure 
dominated by P; by the e-ó version of absolute continuity (see (32.4)) there 
is a ô such that P( A) < ë implies that /,ZdP < e. But P[E[ZII.Z.] > a] < 
a 'E[E[Z||.F,]] == 'E[Z] < à for large enough a. m 


470 DERIVATIVES AND CONDITIONAL PROBABILITY 


Suppose that Z, are o-fields satisfying Z; c Z C :::. If the union 
Un-1%, generates the o-field Z, this is expressed by F, 1 Z. The require- 
ment is not that .Z coincide with the union, but that it be generated by it. 


Theorem 35.6. If Z 1 Z, and Z is integrable, then 
(35.26) E| ZIF, | > E[ZIZ ]. 
with probability 1. 


PRoor. According to Example 35.5, the random variables X, = E[ZI|.Z.] 
form a martingale relative to the .Z. By the lemma, the X, are uniformly 
integrable. Since E[| X, |] < E[| Z|), by Theorem 35.5 the X, converge to an 
integrable X. The problem is to identify X with E[Z||.Z]. 

Because of the uniform integrability, it is possible (Theorem 16.14) to 
integrate to the limit: (4 XdP = lim, {,X, dP. W A E A, and n=k, then 
[,X,, dP = J EIZ |F ]dP = {,ZdP. Therefore, {,XdP = {,ZdP for all A in 
the 7-system Ur Fy; since X is measurable Z, it follows by Theorem 34.1 
that X is a version of E[ Z ||.Z ]. = 


Applications: Derivatives 


Theorem 35.7. Suppose that (Q, Z, P) is a probability space, v is a finite 
measure on F, and 4,1 % C F. Suppose that P dominates v on each F, 
and let X, be the corresponding Radon—Nikodym derivatives. Then X, > X 
with probability 1, where X is integrable. 


(i) If P dominates v on £, then X is the corresponding Radon—Nikodym 
derivative. 


(ii) If P and v are mutually singular on Z, then X = 0 with probability 1. 


PRoor. The situation is that of Example 35.2. The density X, is measur- 
able Z, and satisfies (35.9). Since X, is nonnegative, E[|X,|] = E[ X,] = 
v( Q), and it follows by Theorem 35.5 that X, converges to an integrable X. 
The limit X is measurable Z. 

Suppose that P dominates v on Æ and let Z be the Radon-Nikodym 
derivative: Z is measurable Z, and f .ZdP = v( A) for A € Z. It follows 
that /,ZdP = J ,X, dP for A in Z, and so X, = E[ZII.7,]. Now Theorem 
35.6 implies that X, > E[ZII.Z;] = Z. 

Suppose, on the other hand, that P and v are mutually singular on Z, so 
that there exists a set S in A such that v(S)=0 and P(S) = 1. By Fatou's 
lemma f .XdP < liminf, / X, dP. 1f A € F,, then / X, dP = v(A) for n > k, 
and so {,X dP < v(A) for A in the field U- F. It follows by the monotone 
class theorem that this holds for all A in Z. Therefore, f/X dP = [sXdP < 
v(S) = 0, and X vanishes with probability 1. a 


SECTION 35. MARTINGALES 471 


Example 35.10. As in Example 35.3, let v be a finite measure on the unit 
interval with Lebesgue measure (Q, F, P). For Z, the o-field generated by 
the dyadic intervals of rank n, (35.10) gives X,. In this case Z, 1? A= F. 
For each w and n choose the dyadic rationals a,(w) — k2^" and b,(w) = 
(k +1)2 " for which a,(w)<@<b,(w). By Theorem 35.7, if F is the 
distribution function for v, then 


F(bw))- F(a,(e)) _ 
(35.27) AOET*CY X (w) 


except on a set of Lebesgue measure 0. 

According to Theorem 31.2, F has a derivative F’ except on a set of 
Lebesgue measure 0, and since the intervals (a,(w), b,(w)] contract to w, the 
difference ratio (35.27) converges almost everywhere to F’(w) (see (31.8)). 
This identifies X. Since (35.27) involves intervals (a,(@), b,(w)] of a special 
kind, it does not quite imply Theorem 31.2. ` 

By Theorem 35.7, X = F' is the density for v in the absolutely continuous 
case, and X — F' — 0 (except on a set of Lebesgue measure 0) in the singular 
case, facts proved in a different way in Section 31. The singular case gives 
another example where E[ X,] — E[ X] fails in Theorem 35.5. m 


- Likelihood Ratios 


Return to Example 35.4: v = Q is a probability measure, Z = a(Y,,..., Y,) 
for random variables Y,, and the Radon-Nikodym derivative or likelihood 
ratio X, has the form (35.11) for densities p, and q, on R". By Theorem 
35.7 the X, converge to some X which is integrable and measurable 
EE -o(Y, Voy oui) 

If the Y, are independent under P and under Q, and if the densities are 
different, then P and Q are mutually singular on o(Y,, Y>,...), as shown in 
Example 35.4. In this case X = 0 and the likelihood ratio converges to 0 on a 
set of P-measure 1. The statistical relevance of this is that the smaller X, is 
the more strongly one prefers P over Q as an explanation of the observation 
(Y,,...,Y,), and X, goes to 0 with probability 1 if P is in fact the measure 
governing the Y.. 

It might be thought that a disingenuous experimenter could bias his results 
by stopping at an X, he likes—a large value if his prejudices favor Q, a 
small value if they favor P. This is not so, as the following analysis shows. For 
this argument P must dominate Q on each Z -2o(Y,...,Y,), but the 
likelihood ratio X, need not have any special form. 

Let 7 be a positive-integer-valued random variable representing the time 
the experimenter stops. Assume that r is finite, and to exclude prevision, 
assume that it is a stopping time. The o-field A defined by (35. 20) re id " 
sents the information available at time 7, and the problem is to show that X 


472 DERIVATIVES AND CONDITIONAL PROBABILITY 


is the likelihood ratio (Radon-Nikodym derivative) for Q with respect to P 
on Z. First, X, is clearly measurable Z. Second, if A € Z, then A n[r = 
n] < =, and therefore 


[ x.ap = of x,dP= Y Q(An[r =n]) =A), 


n[r=n] n=1 


as required. 


Reversed Martingales 


A left-infinite sequence..., X_,, X, É 8 martingale relative to o- 
fields..., F_,,.F_, if conditions (ii) and (iii) in the definition of martingale 


are satisfied for n < —1 and conditions (i) and (iv) are satisfied for n < — 1. 
Such a sequence is a reversed or backward martingale. 


Theorem 35.8. For a reversed martingale, lim, ,, X_, = X exists and is 
integrable, and E[ X] = E[ X _,] for all n. 


Proor. The proof is almost the same as that for Theorem 35.5. Let X* 
and X, be the limits superior and inferior of the sequence X ,, X ,,.... 
Again (35.25) holds. Let U, be the number of upcrossings of [o, 8] by 
X., X. ,. By the upcrossing theorem, E[U,] < CE[I X |] + lab /€B — a). 
Again E[U,] is bounded, and so sup, U, is finite with probability 1 and the 
sets in (35.25) have probability 0. 

Therefore, lim, ,,, X , =X exists with probability 1. By the property 
(35.4) for martingales, X_, = E[X lZ ,]forn = 1,2,... . The lemma above 
(p. 469) implies that the X , are uniformly integrable. Therefore, X is 
integrable and E[ X] is the limit of the E[ X_,,]; these all have the same value 
by (35.5). 2 


If FY, are o-fields satisfying F; > Z, ..., then the intersection N?_, % = 
J^, is also a o-field, and the relation is expressed by Z | Fg. 


Theorem 35.9. If Z, A, and Z is integrable, then 
(35.28) E| ZIF | ^ E[ZII.S,] 
with probability 1. 


Proor. If X ,-— E[ZI Z] then ..., X 9, X. , is a martingale relative 

., Fa, F,. By the preceding theorem, E[Z||.%,] converges as n > o to 

an integrable X and by the lemma, the E[Z||.%,] are uniformly integrable. As 
the limit of the E[Z||%,] for n 2 k, X is measurable Z; ; k being arhiteasy, 
X is measurable Z. 


i š 


| 


x 


| 
x 
| 
| 


SECTION 35. MARTINGALES 473 


By uniform integrability, A € A, implies that 
f xaP = lim f E[ ZF, ] dP = tim f E| E[ZI.Z  ]ll.Z,] dP 
A n "A H^ A 
= lim f E[ Zl. %]dP = f E[ZII.g;] aP. 
AVA A 


Thus X is a version of E[Z||.Fpl. m 


Theorems 35.6 and 35.9 are parallel. There is an essential difference 
between Theorems 35.5 and 35.8, however. In the latter, the martingale has a 
last random variable, namely X_,, and so it is unnecessary in proving 
convergence to assume the E||X,,|] bounded. On the other hand, the proof in 
Theorem 35.8 that X is integrable would not work for a submartingale. 


Applications: de Finetti’s Theorem 


Let 6, X,, X5,... be random variables such that 0 < 0 < 1 and, conditionally 
on 0, the X, are independent and assume the values 1 and 0 with probabili- 


ties 0 and 1 — @: for u,,...,u, a sequence of 0’s and 1’s, 
(35.29) PIX, =u,,...,X, = ujj) 6° (ae 
where s =U, + Fu. 
To see that such sequences exist, let 0, Zi, Z,,... be independent random vari- 


ables, where 0 has an arbitrarily prescribed distribution supported by [0, 1] and the Z, 
are uniformly distributed over [0, 1]. Put X, =I; <o} If, for instance, f(x) =x(1 — x) 
—-P[Z, xx, Z,>x], then P[X,=1, X,= Olle] = f(@(w)) by (33.13). The obvious 
extension establishes (35.29). 


Integrate (35.29): 
(35.30) P[X, -u,,..., X, -u,] - E|e*(1 0)" 7]. 


Thus ( X,) is a mixture of Bernoulli processes. It is clear from (35.30) that the 
X, are exchangeable in the sense that for each n the distribution of 
(X,,..., X,) is invariant under permutations. According to the following 
theorem of de Finetti, every exchangeable sequence is a mixture of Bernoulli 
sequences. 


Theorem 35.10. If the random variables X,, X,,... are exchangeable and 


take values 0 and 1, then there is a random variable 0 for which (35.29) and 
(35.30) hold. | = 
p — 


474 DERIVATIVES AND CONDITIONAL PROBABILITY 


Proor. Let S,,=X,+ ++: +X,. If t <m, then 


PIS, Stl] 2, Pl Xl Seyi X, Spl 


uy ame) 


where the sum extends over the sequences for which u; + ::: +u,,=t. By 


exchangeability, the terms on the right are all equal, and since there are 7 
of them, 


2i 
P[X, — 4,54 X, Sp |S = t= 7 


Suppose that s<n<m and u+: +u, =s <t <m; add out the 
Rato- Uy, that Sum to T —5: 


P|X, =u,,...,X, =u,|S,, = t] = bap =! 


= E -j o: 


where 


s—1 


foam) =H (x-£) H i=: 3] n - 4). 


i=0 


The preceding equations still hold if further constraints S,,,, =t}, ES ay 
= t; are joined to S,, = t. Therefore, P[X, =u,,..., X. =u, |lS_,...,§ ME 
n'?m» ^ m+j 
f, X, yn), 
Let 7, =0(S,,,Sin41.---) and A= (, %. Now fix n and U,,...,U,, and 
ar? that Uu, ++: +u, = s. Let jo ma apply Theorem 35.6: 'PLX = i 


uj. A =U Hea = (S, /m). Let m > © and apply Theorem 35.9: 
P| X, Age h X, F mA = lim f, | mko) ) 


holds for w outside a set of probability 0. 

Fix such an w and suppose that {S,,(w)/m)} has two distinct limit points. 
Since the distance from each S,,(w)/m to the next is less than 2/m, it 
follows that the set of limit points must fill a nondegenerate interval. But 
lim, Xm, =x implies lim, f, s,m (x,,) —x'(1 —x)" ^, and so it follows fur- 
ther that x'(1 — x)" * must be constant over this interval, which is impossi- 
ble. Therefore, j m(@)/m must converge to some limit 6(«). This shows that 
P[X, =u,,..., X, =u,l|lZ] = 6*1 — 6)"~* with probability 1. Take a condi- 
tional Wwe with respect to a (0), and (35.29) follows. wan 


SECTION 35. MARTINGALES 475 


Bayes Estimation 


From the Bayes point of view in statistics, the 0 in (35.29) is a parameter 
governed by some a priori distribution known to the statistician. For given 
X,,...,X,, the Bayes estimate of 8 is E[@||X,,..., X,]. The problem is to 
show that this estimate is consistent in the sense that 


(35.31) E[allX,,...,X,] ^0 


with probability 1. By Theorem 35.6, E[6||X,,...,X,,]— E[0||% ], where 
F,=a(X,, X,,...), and so what must be shown is that E[0||Z]= 0 with 
probability 1. 

By an elementary argument that parallels the unconditional case, it follows 
from (35.29) for $, = X, + °°: +X, that E[S,|]0]= n0 and E[(S, — n0)2||0] 
= n0(1 — 6). Hence E[(n~'S, —6)?]<n~', and by Chebyshev’s inequality 
n !$, converges in probability to 0. Therefore (Theorem 20.5), lim, n; ‘Sng 
= with probability 1 for some subsequence. Thus 6 = 6’ with probability 1 
for a 6’ measurable Z, and E[0||Z ] = ELO IZ] = @' = 0 with probability 1. 


A Central Limit Theorem* 


Suppose X,, X,,... is a martingale relative to ¥,, F,,..., and put Y, — X, 
— X, , (Y, 7 Xj), so that 


(35.32) E | YINFA .,] = 0. 


View Y, as the gambler's gain on the nth trial in a fair game. For example, if 
^A,,A5,... are independent and have mean 0, ¥,=o(A,,...,A,), W, is 
measurable .Z ,, and Y, = W,A,,, then (35.32) holds (see (35.17)). A special- 
ization of this case shows that X, = 1; ,Y, need not be approximately 
normally distributed for large 7. 


Example 35.11. Suppose that A, takes the values +1 with probability š 
each and W, = 0, and suppose that W, = 1 for n22 if A, = 1, while W, = 2 
forn>2 if A, = —1.If S, = A.+ :- tAn tben ls Suon2s according 
as À, is +1 or — 1. Sineel S 4 Yn has approximately the standard normal 
distribution, the approximate distribution of X, / Vn is a mixture, with equal 
weights, of the centered normal distributions with standard deviations 1 
and 2. * 


To understand this phenomenon, consider conditional variances. Suppose 
for simplicity that the Y, are bounded, and define 


(35.33) c? = E|Y25, .] 


"This topic, which requires the limit theory of Chapter 5, may be omitted. 


476 DERIVATIVES AND CONDITIONAL PROBABILITY 


(take F, = (@, Q)). Consider the stopping times 
(35.34) v, = mina: i ul 


Under appropriate conditions, X, , / Nt Vt will be approximately normally 
distributed for large t. Consider the preceding example. Roughly: If A, = +1, 
then Epp = n — l, and so v, «t and X, yyt s 8, / Vti fA = then 

n 197 = 4(n — 1), and so v, « t/4 and X, y; P 28,4 / t = Sa / 1/4. 
In either case, X, / vt approximately follows the standard normal law. 

If the nth play takes o? units of time, then v, is essentially the number of 
plays that take place during the first t units of time. This change of the time 
scale stabilizes the rate at which money changes hands. 


Theorem 35.11. Suppose the Y, = X, — X, , are uniformly bounded and 
satisfy (35.32), and assume that Y, o? = oo with probability 1. Then X, / vt vt > N. 


Visi 


This will be deduced from a more general result, one that contains the 
Lindeberg theorem. Suppose that, for each n, X,,, X,,,... is a martingale 
with respect to Fis o-s.: Define Yk =X nk — X; k= SUPPOSE tle 
have second moments, and put oj ID ZF, cil o — 02; OD ie 
probability space may vary with n. If ‘ine martingale is originally defined only 
lor 1 < k xr,, take Y,, = 0 and Z, — Z, for k > Assume that s SW 
and Y“_ G, converge with probability 1. 


Theorem 35.12. Suppose that 
(35.35) DED Gs 
where o is a positive constant, and that 


(36.36) X EY liil] 2 


for each e. Then Y _\Y,, > ON. 


PROOF OF THEOREM 35.11. etm proof will be given for ¢ going to infinity 
through the integers.’ Let Y,, = I, > ,Y,/ Vn and F, = "f From [v, > k] 

= [F] joy «n]e F,_, elas EY, |Z , _ ]= 0 and o) = ELY, nell Fr, kil 
=I, >g /n. If K bounds the |Y,|, then < z-i =n Lie SRB 
K 2/n, so that (35.35) holds for ø = 1. For n large pem that K/ yn Yin <E, 
the sum in (35.36) vanishes. Theorem 35.12 therefore applies, and 
D Y, Yn = Xj. Ya N, a 


tFor the general case, first check that the proof of Theorem 35.12 goes e satadi chan CN 


if n is replaced by a parameter going continuously to infinity. — 


SECTION 35. MARTINGALES 477 


PROOF OF THEOREM 35.12. Assume at first that there is a constant c such 
that 


(35.37) 2, 065, SC, 


which in fact suffices for the application to $9 35.11. 

Write S, = Erai Y (Sp = 0), S,—X7., Yap X,—1j.,02 (S =0), and 
= Lio „j the dependence on n is suppressed in the notation. To prove 
Ele''5-] 2 e ` ° yo? , observe first that 


[es enel 
“a | E| e'*-(1 ~ UE *e (er Meus 1)| | 
<en -ette ti] + [zee i| a ee 


The term A on the right goes to 0 as n > œ, because by (35.35) and (35.37) 
the integrand is bounded and goes to 0 in probability. 
The integrand in B is 


Y e! Sia ( eirYuk ud e oak) piti 


because the mth partial sum here telescopes to e/'$me?' 3» — 1. Since, by 


(35.37), this partial sum is bounded uniformly in m, and since S,_, and X, 
are measurable FY, ,_,, it follows (Theorem 16.7) that 


B E| e/5« iei (eit — e t i3] 


[715 iM: 


| w T ESI 2 Y 
[Efese 4 E | eit z ema “|Z ill 


lA 
> 
ll 


| 2" E|| E[e» - e- i^, ] |]. 


To complete the proof (under the temporary assumption (35.37)), it is enough 
to show that this last sum goes to 0. 
By (26.4,), 
(35.38) e!'Yw = 1] + itY,, — 4t Y3 + 0, 
where (write /,, =Iyy,.;>.<) and let K, bound !2 and Ic) 


Jal < min(IY, l^, lY, P) < K (Y lng eY,À). 


478 DERIVATIVES AND CONDITIONAL PROBABILITY 


And 
(38.39) Weta hisk + p, 
where (use (27.15) and increase K,) 
|0'| < (120,4) e2 < t^g A ei « K,a,4,. 


Because of the condition E[Y,,|l.Z, ,..,] » 0 and the definition of o7, the 
right sides of (35.38) and (35.39), minus 0 and 6’, respectively, have the same 
conditional expected value given Z, ,_,. By (35.37), therefore, 


> E|| E[e"%™ - ene al $i | 
k=1 


<K, Y. (E[YÀ14] +eE[o2.] + E[o]) 


K Y E[YAL.|*eccE supo. |. 
k=1 k>1 
Since a; < Plett Y, , |< 2 + X; EY IL ||% ,_,) it followsiay 
(35.36) that the last expression above is, in the limit, at most K (ec + ce?). 
Since e is arbitrary, this completes the proof of the theorem under the 
assumption (35.37). 

To remove this assumption, take c > o°, define A,,, =[L‘_,0,2,<c] and 
A ne =(LF_\0,,<c], and take Z,,= Yikla e Promi Ay en Ze M follow 
AZMA ,.,] 0 and ty = ELZi MF, , 1 — L, mA. Since X7.,72, is 
Y, 62 on A,, — A, ,,, and X707 on A,., the Z-array satisfies (35.37). 
Now P(A,,) 1 by (35.35), and on A,» 72, = o, for all k, so that the 
Z-array satisfies (35.35). And it satisfies (35.36) because (Zal € IY, |. There- 
fore, by the case already treated, L?_,Z,, = oN. But since Yea, COin- 
cides with this last sum on 4, it, too, is asymptotically normal. L 


PROBLEMS 


35.1. Suppose that A,,4,,... are independent random variables with mean 0. Let 
X, = À, and X,,,— X, Antifa Xi- Xn), and suppose that the X, are 
ope Show that (. X,) is a martingale. The martingales of gambling have 
this form. 


35.2. Let Y,, Y;,... be independent random variables with mean 0 and variance o? 
Let X, = (Ez .1Y,)^ — no? and show that (X,) is a martingale. ! 


SECTION 35. MARTINGALES 479 


3523. 


35.4. 


tod 
Ùn 
in 


35.7. 


35.8. 


35.9. 


35.10. 


35,11. 


35.12. 


Suppose that (Y,) is a finite-state Markov chain with transition matrix [p]. 
Suppose that X pi x(j) = Àx(i) for all i (the x(i) are the components of a right 
eigenvector of the transition matrix). Put X, —A "x(Y,) and show that (X,) is 
a martingale. 


Suppose that Y, Y,,... are independent, positive random variables and that 
E[Y,]-1.Put X, = Yy `x: Y,. 

(a) Show that (X,) is a martingale and converges with probability 1 to an 
integrable X. 

(b) Suppose specifically that Y, assumes the values + and 3 with probability š 
each. Show that X=0 with probability 1. This gives an example where 
EU T; .,Y,] + IT; .,E[Y,] for independent, integrable, positive random vari- 
ables. Show, however, that E[IT; .,Y,] < IT; ., E[Y,] always holds. 


. Suppose that X,, X5,... is a martingale satisfying E[X,]=0 and E[ X7] < v. 


Show that E[CX, ,, — X,)?] 2 X1 _ E[( X, ,,, — X, , | - 2“ ] (the variance of the 
sum is the sum of the variances). Assume that X, E[CX, — X,..,)*] <œ% and 
prove that X, converges with probability 1. Do this first by Theorem 35.5 and 


then (see Theorem 22.6) by Theorem 35.3. 


. Show that a submartingale X, can be represented as X, = Y, + Z,, where Y, 


is a martingale and 0 < Z, < Z, < : : * . Hint: Take X; = 0 and A, = X, — X, ,, 
and define Z, = EZELAN A 1] C7, = (0, OY. 


If X,,X5,... is a martingale and bounded either above or below, then 
sup, E[|X,,|] < o. 


t Let X,—A,-* ::: +À,, where the A, are independent and assume the 
values +1 with probability + each. Let + be the smallest n such that X, — 1 
and define X* by (35.19). Show that the hypotheses of Theorem 35.5 are 
satisfied by (X7) but that it is impossible to integrate to the limit. Hint: Use 
(7.8) and Problem 35.7. 


Let X,, X,,... be a martingale, and assume that | X,(w)| and | X, (c) — X, _ (@)| 
are bounded by a constant independent of o and n. Let + be a stopping time 
with finite mean. Show that X, is integrable and that E[ X,] = E[ Xj]. 


35.8 35.97 Use the preceding result to show that the 7 in Problem 35.8 has 
infinite mean. Thus the waiting time until a symmetric random walk moves one 
step up from the starting point has infinite expected value. 


Let X,, X,,... be a Markov chain with countable state space S and transition 
probabilities p;j. A function g on Š is excessive or superharmonic if (i) > 
Y, p;j9 j). Show by martingale theory that p(X,,) converges with probability 1 
if ọ is bounded and excessive. Deduce from this that if the chain is irreducible 
and persistent, then x must be constant. Compare Problem 8.34. 


1 A function e on the integer lattice in R* is superharmonic if for each 
lattice point x, g(x) > (2k)! Xg(y), the sum extending over the 2k nearest 
neighbors y. Show for k = 1 and k = 2 that a bounded superharmonic function 
is constant. Show for k > 3 that there exist nonconstant bounded harmonic 
functions. 


480 


35.13. 


35.14. 


35.15. 


35.16. 


35.17. 


35.18. 


DERIVATIVES AND CONDITIONAL PROBABILITY 


32.7 3291 Let (OQ, FY, P) be a probability space, let v be a finite measure 
on F, and suppose that Z, 1.7, C F. For n «o, let X, be the Radon- 
Nikodym derivative with respect to P of the absolutely continuous part of v 
when P and v are both restricted to .Z. The problem is to extend Theorem 
35.7 by showing that X, > X, with probability 1. 


(a) For n < o, let 
v(A)=[X,dP+0,(A), AES, 
A 


be the decomposition of v into absolutely continuous and singular parts with 
respect to P on Z. Show that X,, X;,... isa supermartingale and converges 
with probability 1. 

(b) Let 


e A) = | Z, dP eA). AEF, 


be the decomposition of o; into absolutely continuous and singular parts with 
respect to P on &. Let Y, = E[ XIZ], and prove 


J (Y, + Z,) dP + o;(4) = | X, dP + o, (A). Ae. 
A A 


Conclude that Y, + Z, = X, with probability 1. Since Y, converges to X,, Z, 
converges with probability 1 to some Z. Show that /,ZdP < oA) for A € F, 
and conclude that Z = 0 with probability 1. 


(a) Show that (X,) is a martingale with respect to (Z) if and only if, for all n 
and all stopping times 7 such that 7 < n, E[X, ||F] = 


(b) Show that, if (X,) is a martingale and 7 is a bounded stopping time, then 


31.91 Suppose that 7,1 and A € Z, and prove that P[Al|F] > L 
with probability 1. Compare Lebesgue's density theorem. 


Theorems 35.6 and 35.9 have analogues in Hilbert space. For n < o, let P, be 
the perpendicular projection on a subspace M,. Then P,x > P.x for all x if 
either (a) M; CM, C ::* and M, is the closure of U, <, M, or (b) M, 2 M5 
>--- and M, f), Mp 


Suppose that 0 has an arbitrary distribution, and suppose that, conditionally 
on 0, the random variables Y,,Y,,... are independent and normally dis- 
tributed with mean 0 and variance a ?. Construct such a sequence (6, VP UE 
Prove (35.31). 


It is shown on p. 471 that optional stopping has no effect on likelihood ratios. 
This is not true of tests of significance. Suppose that X;, X5,... are indepen- 
dent and identically distributed and assume the values 1 and 0 with probabili- 
ties p and 1 — p. Consider the null hypothesis that p = 1 and the alternative 
that p> 3. The usual .05-level test of significance is to reject the null 


SECTION 35. MARTINGALES 481 


35.20. 


hypothesis if 
35.40 s X 
(° d ) wn ay e. +X, — 5n) > 1.645. 


For this test the chance of falsely rejecting the null hypothesis is approximately 
P[N > 1.645] = .05 if n is large and fixed. Suppose that n is not fixed in 
advance of sampling, and show by the law of the iterated logarithm that, even 
if p is, in fact, 1, there are with probability 1 infinitely many n for which 
(35.40) holds. 


. (a) Suppose that (35.32) and (35.33) hold. Suppose further that, for constants 


n? 


$. ope], SIN. Mnt: Siplity the proof of Theorem 35.11. 
(b) The Lindeberg-Lévy theorem for martingales. Suppose that 


$25 S, k= 16k —p1 and s,?^Yf.,E[Y?Iy.,] 0, and show that 


MP CERO CA e a a 
Is stationary and ergodic (p. 494) and that 
E[Y2] <= and PIYE 


Prove that X7 Y, /vn is asymptotically normal. Hint: Use Theorem 36.4 and 
the remark following the statement of Lindeberg's Theorem 27.2. 


24.41 Suppose that the o-field A in Problem 24.4 is trivial. Deduce from 
Theorem 35.9 that P[A||T~".F ] > P[A|| A] = PCA) with probability 1, and 
conclude that T is mixing. 


CHAPTER? 


Stochastic Processes 


SECTION 36. KOLMOGOROV’S EXISTENCE THEOREM 


Stochastic Processes 


A stochastic process is a collection [X,: t€ T] of random variables on a 
probability space (Q, F, P). The sequence of gambler’s fortunes in Section 7, 
the sequences of independent random variables in Section 22, the martin- 
gales in Section 35—all these are stochastic processes for which T = (1,2, ...). 
For the Poisson process [N,: t > 0] of Section 23, T = [0,o»). For all these 
processes the points of T are thought of as representing time. In most cases, 
T is the set of integers and time is discrete, or else T is an interval of the line 
and time is continuous. For the general theory of this section, however, T can 
be quite arbitrary. 


Finite-Dimensional Distributions 


A process is usually described in terms of distributions it induces in Eu- 


clidean spaces. For each k-tuple (t,,...,¢,) of distinct elements of T, the 
random vector CX, ,..., X, ) has over R* some distribution u TET 
(363) an KH)=P|(X,,....X ep], Hes. 


These probability measures u, ,, are the finite-dimensional distributions of 
the stochastic process [X,: t € T]. The system of finite-dimensional distribu- 
tions does not completely determine the properties of the process. For 
example, the Poisson process [ N,: t > 0] as defined by (23.5) has sample paths 
(functions N,(w) with w fixed and / varying) that are step functions. But 
(23.28) defines a process that has the same finite-dimensional distributions 
and has sample paths that are not step functions. Nevertheless, the first step 
in a general theory is to construct processes for given systems of finite- 
dimensional distributions. 


482 j 


SECTION 36. KOLMOGOROV’S EXISTENCE THEOREM 483 


Now (36.1) implies two consistency properen of the system 2 
Suppose the H in (36.1) has the form H =H, x : ha (H,e Z! ), and 
consider a permutation m of (1,2,.. Wa: Since [ X, ETT, T pe (HAS? SX 
H,)] and (X, ,,..., X, )e GI, X :: X Hl are the same event, it follows 
by (36.1) that 


(36.2) Ba, ol f) X cH) XT let X (OT). 


For example, if uw, , = v Xv’, then necessarily u, , = v X v. 
The second consistency condition is 


(36.3) pM S XH, 1) = care HOS s X Hy, XR’). 
This is clear because OG, oe m, lies in H X XS if and only if 
(Krn Sane eae O KLU x RI. 

Medisufes n, ,. coming from a process [ X,: t € T] via (36.1) necessarily 


satisfy (36.2) and (36. 3). Kolmogorov’s existence theorem says conversely that 
if a given system of measures satisfies the two consistency conditions, then 
there exists a stochastic process having these finite-dimensional distributions. 
The proof is a construction, one which is more easily understood if (36.2) and 
(36.3) are combined into a single condition. 

Define o,: R* > R* by 


OR erento us ) = (X aaa 


y_ applies the permutation 7 to the coordinates Tea example, if 7 dA X3 
to first position, then = !1—3). Since o; (H, X ++ XH,)=H,, - x 
H_,, it follows from (36.2) that 


(bere Ch) = pig 28 


for rectangles H. But then 
(36.4) Ny m ER 


Similarly, if 9: R* > R*~' is the projection q(x,,..., x4) = Gr... Xr 1) 
then (36.3) is the same thing as 


(36.5) jt, 0, inp, sanat 


The conditions (36.4) and (36.5) have a common extension. Suppose that 
(u,,...,u,,) is an m-tuple of distinct elements of T and that each element of 
(t,,..., t4) is also an element of (u,,...,u,,). Then (t,,...,t,) must be the 
initial segment of some permutation of (u,,...,u,,); that is, k < m and there 


484 STOCHASTIC PROCESSES 


is a permutation m of (1,2,...,m) such that 
(lg-s tUg) = (D... tp tenis +s bade 
where (ar... t, are elements of (u,,...,u,) that do not appear in 


(t,,..., t,). Define y: R” > R* by 
(36.6) (xs X4) Spel ut) 


V applies 7 to the coordinates and then projects onto the first k of them. 
Since UK, ,...,X, 98 (X, Ls AyD 


=) 
(35.7) om OPER quU x 


This contains (36.4) and (36.5) as special cases, but as y is a coordinate 
permutation followed by a sequence of projections of the form (x;,..., xj) > 
(x,,..., X1 4), it is also a consequence of these special cases. 


Product Spaces 


The standard construction of the general process involves product spaces. 
Let T be an arbitrary index set, and let R^ be the collection of all real 
functions on 7—all maps from T into the real line. If T = (1,2,..., k), a real 
function on T can be identified with a k-tuple (x,,..., x,) of real numbers, 
and so R7 can be identified with k-dimensional Euclidean space R“. If 
T = (1,2,...), a real function on T is a sequence (x,, x,,...} of real numbers. 
If T is an interval, RT consists of all real functions, however irregular, on the 
interval. The theory of R! is an elaboration of the theory of the analogous 
but simpler space S” of Section 2 (p. 27). 

Whatever the set T may be, an element of R^ will be denoted x. The 
value of x at t will be denoted x(t) or x,, depending on whether x is viewed 
as a function of t with domain T or as a vector with components indexed by 
the elements t of T. Just as R“ can be regarded as the Cartesian product of 
k copies of the real line, R’ can be regarded as a product space—a product 
of copies of the real line, one copy for each t in T. 

For each t define a mapping Z,: R! > R! by 


(36.8) Z(x) =x(t) =x; 


The Z, are called the coordinate functions or projections. When later on à 
probability measure has been defined on R”, the Z, will be random variables, 
the coordinate variables. Frequently, the value Z (x) is instead denoted 
Z(t, x). If x is fixed, Z(-, x) is a real function on T and is, in fact, nothing 
other than x(-)—that is, x itself. If ¢ is fixed, Z(t, -) is a real function on RZ 
and is identical with the function Z, defined by (36.8). 


SECTION 36. KOLMOGOROV’S EXISTENCE THEOREM 485 


There is a natural generalization to R’ of the idea of the o-field of 
k-dimensional Borel sets. Let .2T be the o-field generated by all the 
coordinate functions Z,, t € T: @T =o[Z,: t € T]. It is generated by the sets 
of the form 


[xeR': Z(x) eH] - [x eR": x, eH] 


for t € T and He !. If T-(1,2,..., k}, then ZT coincides with A*. 
Consider the class 27 consisting of the sets of the form 


(36.9) A= |x ER (Z1)... 2 00) e 
= Ire: (31,50) eH], 


where k is an integer, (t,,...,t,) is a k-tuple of distinct points of T, and 
H € &*. Sets of this form, elements of AJ, are called finite-dimensional sets, 
or cylinders. Of course, Ai generates ZT. Now Zå is not a o-field, does 
not coincide with 27 (unless T is finite), but the following argument shows 
that 1t 1s a field. 


ti t2 


If T is an interval, the cylinder [x € RT: a, <x(t;)<B,, a, < x(t) < B5] consists of the 
functions that go through the two gates shown; y lies in the cylinder and z does not (they need 
not be continuous functions, of course). 


The complement of (36.9) is R’— A =[x € RT: (x,,..., x, ) € R^ — H], 
and so By is closed under complementation. Suppose that A is given by 
(36.9) and B is given by 


(36.10) B= [x eR": (i dana) ef], 
where I € .Z;. Let (u,...,u,,) be an m-tuple containing all the ¢, and all 


the s,. Now (t,,...,¢,) must be the initial segment of some permutation of 
(U,,...,U,,), and if y is as in (36.6) and H' = y~'H, then H' € Z" and A is 


486 STOCHASTIC PROCESSES 


given by 

(36.11) Ax [DER (x, 7:8, ya H'| 
as well as by (36.9). Similarly, B can be put in the form 
(36.12) hx SR (za, homer pn 
where /' € X". But then 


(36.13) AUB - [x e RT: (x,,..., x, ) eH'Ur |. 


u,* 


Since H'UI' € Z2", AUB is a cylinder. This proves that 4 is a field such 
that RT = a (21). 

The Z, are measurable functions on the measurable space (R’, 27). If P 
is a probability measure on 27, then [Z,: t € T] is a stochastic process on 
(RT, PT, P), the coordinate-variable process. 


Kolmogorov's Existence Theorem 


The existence theorem can be stated two ways: 


Theorem 36.1. If 4, ,, are a system of distributions satisfying the consis- 
tency conditions (36.2) and (36.3), then there is a probability measure P on @T 
such that the coordinate-variable process [Z,: t € T] on (RT, T, P) has the 
fL. AE its finite-dimensional distributions. 


Theorem 36.2. If p, ,, are a system of distributions satisfying the consis- 
tency conditions (36.2) and (36.3), then there exists on some probability space 
(Q, .Z, P) a stochastic process [X,: t€ T] having the Mr... AS its finite- 
dimensional distributions. 

For many purposes the underlying probability space is irrelevant, the joint 
distributions of the variables in the process being all that matters, so that the 
two theorems are equally useful. As a matter of fact, they are equivalent 
anyway. Obviously, the first implies the second. To prove the converse, 
suppose that the process [X,: t € T] on (Q, ¥, P) has finite-dimensional 
distributions 4, __,,, and define a map £: 2 > RT by the requirement 


(36.14) Z(E(w))=X(w), ter. 


For each w, £(w) is an element of RT, a real function on T, and the 


SECTION 36. KOLMOGOROV S EXISTENCE THEOREM 487 


requirement is that X,(w) be its value at t. Clearly, 


(36.15) é"[xeR™: (Z,(x),...,2,(4)) em) 


= | v EQ: (Z,(E(@)),...,2,,(€(@))) e H| 
=[€0:(X,(o),...,X,(@)) EH]; 


since the X, are random variables, measurable Z, this set lies in F if 
H € &*. Thus £ 'A € F for A € 4, and so (Theorem 13.1) £ is measur- 
able ¥/#". By (36.15) and the assumption that [X,: t & T] has finite- 
dimensional distributions u, — P£! (see (13.7)) satisfies 


(3636) BE IX RU (EG TE yem] 


-P|ee€0:(X,(o),..., X, (»)) eH] 2n, (H). 


Thus the coordinate-variable process [Z,: t € T] on (RT, B’, P£!) also has 
finite-dimensional distributions u, _ 

Therefore, to prove either of the ug versions of Kolmogorov's existence 
theorem is to prove the other one as well. 


Example 36.1. Suppose that T is finite, say T —(1,2,...,k). Then 
( RT, PT) is (R*, A“), and taking P = u, > , satisfies the requirements of 
Theorem 36.1. m 


Example 36.2. Suppose that T = (1,2,...) and 
(36.17) pet Ik Sua 


where j,]5,... are probability distributions on the line. The consistency 
conditions are easily checked, and the probability measure P guaranteed by 
Theorem 36.1 is product measure on the product space (R7, 2"). But by 
Theorem 20.4 there exists on some (Q, ., P) an independent sequence 
X, X5,... of random variables with respective distributions 4, 45,...; then 
(36.17) is the distribution of (X,,..., X,). For the special case (36. 17), 
Theorem 36.2 (and hence Theorem 36.1) was thus proved in Section 20. The 
existence of independent sequences with prescribed distributions was the ` 
measure-theoretic basis of all the probabilistic developments in Chapters 4 
5, and 6: even dependent processes like the Poisson were constructe 
independent sequences. The existence of independent sand bn C 
made the basis of a proof of Theorems 36.1 and 36.2 in their full ge 
see the second proof below. E 


488 STOCHASTIC PROCESSES 


Example 36.3. The preceding example has an analogue in the space S° of 
sequences (2.15). Here the finite set § plays the role of R!, the z C) are 
analogues of the Z,(-), and the product measure defined by (2. 21) is the 
analogue of the product measure specified by (36.17) with m; =p. See also 
Example 24.2. The theory for S° is simple because $ is finite: see Theorem 
2.3 and the lemma it depends on. " 


Example 36.4. If T isa subset of the line, it is convenient to use the order 
structure of the line and take the yw, to be specified initially only for 
k-tuples (s,,...,5,) that are in increasing order: 


(36.18) G SOS ed uS < $. 


It is natural for example to specify the finite-dimensional distributions for the 
Poisson processes for increasing sequences of time points alone; see (23.27). 

Assume that the u, for k-tuples satisfying (36.18) have the consistency 
property 


- Sk 


(36.19) M. o OX > XH OX GEL SS Sp CE) 


Sj—1Sj+1---Sk 


=, S(H,XCU XH, X R! XH xX <a 


For given 5,,..., 5, satisfying (36.18), take (X, ,..., X, ) to have distribution 
Bs, s, If sess t Is a permutation of s,...,5,, take Ha... to be the 
distribution of (X ip AK D): 


(36.20) IH, NEN X H,) = P| XH k |a 


This unambiguously defines a collection of finite-dimensional distributions. 
Are they consistent? 


If 1,,,...,t,,, is a permutation of ¢,,...,¢,, then it is also a permutation of 
$,,...,$,, and by the definition (36. 20), Hs.. i8 the distribution of 
(X, ,..., X, ,), which immediately gives (36. Dr the first of the consistency 


tenditione "Because on (36.19), Hs\...5;-ss/41-..8, iS the distribution of 
(KG reco My, 


s iro XS), and if t, =s; then f; ^ pp is a permutation 
of Si... Of: V LAM uin which are in increasing order. By the definition 
(36.20) applied to ipt. rti. it therefore follows that u, , , is the 
distribution of (X,,..., X, ). But this gives (36.3), the sens of te 
consistency conditions. mini 

It will therefore follow from the existence theorem that if TCR R! anc 
ul oos, ds defined for all k-tuples in increasing order, and if (36.19) hold 
then there exists a stochastic process [ X,: t € T] satisfying (36. 


ing £,,..., fk. 


SECTION 36. KOLMOGOROV'S EXISTENCE THEOREM 489 


Two proofs of Kolmogorov's existence theorem will be given. The first is 
based on the extension theorem of Section 3. 


FIRST PROOF OF Kor MoGOROV's THEOREM. Consider the first formula- 
tion, Theorem 36.1. If A is the cylinder (36.9), define 


(36.21) P(A) gs uU 


This gives rise to the question of consistency because A will have other 
representations as a cylinder. Suppose, in fact, that A coincides with the 
cylinder B defined by (36.10). As observed before, if (u,,...,u,,) contains all 
the t, and ss, A is also given by (36.11), where H' = y^ !H and y is defined 
in (36.6). Since the consistency conditions (36.2) and (36.3) imply the more 
general one (36.7), PA) =u, , atya t mot) Similarly, (36.10) has 


the form (36.12), and P(B) UN kae NUI Since the u, are dis- 
tinct, for any real numbers z,,..., E NU are "points x of RT for which 
(x... X4 ) — Gy... , z,,). From this it follows that if the cylinders (36.11) 


and (36. 12) coincide. en H'-I'. Hence A=B implies that P(A)= 
Biss, sel Pe S P(B), and the definition (36.21) is indeed consis- 
tent. 

Now consider disjoint cylinders A and B. As usual, the index sets may be 
taken identical. Assume then that A is given by (36.11) and B by (36.12), so 
that (36.13) holds. If H’ ^ I" were nonempty, then A N B would be nonempty 
as well. Therefore, H’ A T' = ©, and 


P(AWWB) E u à (Hd) 
= jt. HT) + uu (P) = PCA) + PCB). 


Therefore, P is finitely additive on 27. Clearly, P(R^) = 1. 

Suppose that P is shown to be cowntalsty additive on 2f. By Theorem 3.1, 
P will then extend to a probability measure on 27. By the way P was 
defined on Zf, 


(3622 . P[ x ER: (Z (x)... Z,(3)) eH] =i 4, (H)» 


and therefore the coordinate process [Z,: t € T] will have the required 
finite-dimensional distributions. 

It suffices, then, to prove P countably additive on Rj, and this will follow 
if A, € ZT and A, | together imply PCA,) 1,0 (see Example 2.10). Sup- 
pose "that A,2452 : : + and that P(A,) > e > 0 for all n. The problem is to 
show that n, A,, must be nonempty. Since A, € P, and since the index set 
involved in the specification of a cylinder can always be permuted and 


490 STOCHASTIC PROCESSES 


expanded, there exists a sequence !,,t,,... of points in T for which 
A, = [x ER’: (Wv: o Ri) e H,]. 


where! H, € X". l 

Of course, PCA,) =u, ... t (H,). By Theorem 12.3 (regularity), there 
exists inside H, a compact set K, such that p, s Chin iy oh . If 
B, SIERA (x... tn) E K,], then P(A, = B,) ««/2^* ^, Put iO, 
12.,B,. Then C, c B, CA, and P(A, = C,) « e/2, so that PCC, A) 6/2 
> 0. Therefore, C, c C, , and C, is nonempty. 

Choose a point x of RT in C,. If n >k, then x“ € C, CC, C B, and 
hence (x(,..., x) € K,. Since K, is bounded, the sequence (xt), a - 
is bounded for each k. By the diagonal method [A14] select an increasing 
sequence n, ;,... of integers such that lim, x(/? exists for each k. There is 
in R! some point x whose t th coordinate is this limit for each k. But then, 
for each k, (x,,..., x, ) is the limit as i > o» of (E AUD NIC UD) and hence lies 
in K,. But that means that x itself lies in B, and hence in A,. Thus 
x € (17. ,A,, which completes the proof.* E] 


The second proof of Kolmogorov's theorem goes in two stages, first for countable 
T, then for general T.* 


SECOND PROOF FOR COUNTABLE T. The result for countable T will be proved in 
its second formulation, Theorem 36.2. It is no restriction to enumerate T as (t, t5, ...) 
and then to identify ¢, with n; in other words, it is no restriction to assume that 
7 —1012 22-0. Write p. in place of mizer ,.- 

By Theorem 20.4 there exists on a probability space (Q, F, P) (which can be taken 
to be the unit interval) an independent sequence U,, U,,... of random variables each 
uniformly distributed over (0, 1). Let F, be the distribution function corresponding to 
p. If the “inverse” g, of F, is defined over (0,1) by g,(s) = inf[x: s < F (x)], then 
X; = g(U;) has distribution u, by the usual argument: P[g (U,) < x] = P[U, < F,(x)] 
= F (x). 

The problem is to construct X,, X3,... inductively in such a way that 


(36.23) X,=h,(U,,...,U,) 


for a Borel function h, and (X;,,...,X,) has the distribution Hna. Assume that 
X,,...,X, -1 have been defined (n > 2): they have joint distribution Hn- ı and (36.23) 
holds for k < n — 1. The idea now is to construct an appropriate conditional distribu- 
tion function F,(x|x,,...,x,,_,); here F,(x|X\(w),..., X, _ \(w)) will have the value 
P[X, x xl X,,..., X, .,],, would have if X, were already defined. If g,(-|x,,-..,X,—1) 


"In general, A, will involve indices 1;,...,t, , where a, <a, < ---. For notational simplicity an 
is taken as n. As a matter of fact, this can be arranged anyway: Take A, =A,, Ai, = (x: 
COS ee, e R*] - RT for k «a,, and A, =[x: (x; xi) EA, x REI] = A, for a, < 
k <a, ,,. Now relabel A’, as A,,. UN 
*The last part of the argument is, in effect, the proof that a countable product of compact sets is ` 
compact. E 

*This second proof, which may be omitted, uses the conditional-probability thec 


SECTION 36. KOLMOGOROV S EXISTENCE THEOREM 491 


is the “inverse” function, then X,(w)=g,(U,(w)|X\(w),..., X, ,()) will by the 
usual argument have the right conditional distribution given X,,...,X,—1, so that 
(X,,..., X, Xn) will have the right distribution over R". 

To construct the conditional distribution function, apply Theorem 33.3 in 
(R^, P”, u„) to get a conditional distribution of the last coordinate of CK TENA S 
given the first m — 1 of them. This will have (Theorem 20.1) the form 
v(H; Xq,..., X4 4); it is a probability measure as H varies over .2!, and 


J D(H; Zisis 44-4) Ay quii.) 


Xp: Aao pD EM 


=pp[ xe R (xy. taaa 


Since the integrand involves only x,,..., x,. ,, and since u, by consistency projects to 
u, , under the map (x,,..., x,) > (x,,..., X, .,), a change of variable gives 


J PH; xis- Xn-1) dis iQ Fn) 


-gu[xe€R*:(x,..5x,25.e M, X, e HT: 


Define Ex|x, 1, x9 = 9»(( 7o, x]; x. 2x2) Them BC r a 
probability distribution function over the line, F,(x|:) is a Borel function over RIA, 
and 


J FG) duas (cato) 


= [ye Rn: (s... Xa E MS 


Put gíGx,...,x,—) = mt[x: u < Exlxy x ptor U 
F(x|x,,..., X, 1) is nondecreasing and right-continuous in x, g,(ulx,,..., x, i) €x 
if and only if ^u «FE(xx,7.7;X5-,. “Set X; =p (0 DX, EE 
(X,,..., X, ,) has distribution 4, ., and by (36.23) is independent of U,, an 
application of (20.30) gives 


PL X Me m MU x 
- P[(X,,..., X, ye Mr U &FE/( 3008, ERI 


- f PLY, < F,( xl55.5.,344)] db, pea) 


= Jl... Ly) dies (x Xn) 
= [se Rr (v 0, ko) SM N SQA 
Thus (X;,..., X,) has distribution u. Note that X,, as a function of X,,... X 


and U,, is a function of U,,...,U, because (36.23) was assumed to hol sn 
Hence (36.23) holds for k =n as well. old for k < 


_ 


a3. 


492 STOCHASTIC PROCESSES 


re PROOF FOR GENERAL T. Consider (RT, 2") once again. If SCT, let 
=o[Z,: t € S]. Then Fc Z, = f. 
ge ee that Š is countable. By the case just treated, there exists a process [X;: 
t€ S] on some (Q, F, P)—the space and the process depend on S—such that 
(X,,..., X, ) has distribution u, ,, for every k-tuple (1,,...,¢,) from S. Define a 
map £: NART by requiring that - 


X (w) iftes, 
2 
HAAL i H te5. 
Now (36.15) holds as before if f,,...,¢, all lie in S, and so £ is measurable .Z/ Fg. 


Further, (36.16) holds for ¢,,... Pu in $. Put P. = PP" of Fy, men P Ws 
probability measure on (R7, ç), and 


(36.24) Ps| x ER’: (EEs Z) eH] =, (H) 


if HEZ" and t,,...,t, all lie in S. (The various spaces (Q, .F, P) and processes 
[X,: t € S] now become irrelevant.) 

If Sọ C S, and if A is a cylinder (36.9) for which the ¢,,...,¢, lie in So, then Ps (A) 
and P.(A) coincide, their common value being p, (H). Since these cylinders 


generate Fy, Ps (A)= P, (A) for all A in Zç. If A ee both in Fç, and Fz, then 


PsC A) — Ps inea) = Ps (A). Thus P(A)= PA) consistently defines a set function 
on the class U « 4, the union extending over the countable subsets S of T. If A, lies 
in this union and A, € Fs, (S, countable), then S = U „S, is countable and U, A, 

lies in Z. Thus U s F is a o- -field and so must coincide with 4. Therefore, P is a 
puibubiliy measure on 27, and by (36.24) the coordinate process has under P the 
required finite-dimensional distributions. ‘fi 


The Inadequacy of 27 
Theorem 36.3. Let [X,: t € T] be a family of real functions on Q. 


(i) If A€ol[X,: t€ T] and o € A, and if X((») 2 X(w) for all t€ T, 
then o € A. 

Gi) IfA €ol[X,: t € T], then A € o[ X,: t € S] for some countable subset S 
of T. 


Proor. Define £: Q — R? by Z(£(») =X (w). Let F=o[X,: t€ T]. 
By (36.15), £ is measurable .Z/.@! and hence F contains the class [£^ ! M: 
M € Z!]. The latter class is a o-field, however, and by (36.15) it contains the 
sets [o EQ: (X,(0),..., X, (o) € Hl, H € *, and hence contains the 
o-field they generate. Therefore 


(36.25) e[X,;:teT]- [e£ 'M: Me 77]. 


This is an infinite-dimensional analogue of Theorem 20.1(i). 


SECTION 36. KOLMOGOROV’S EXISTENCE THEOREM 493 


As for (i), the hypotheses imply that w € A = £ `!M and &(w) = £(v'), so 
that w € A certainly follows. 

For Sc T, let 9, = o[X,: t e S]; (ii) says that F= F; coincides with 
Y= UFs, the union extending over the countable subsets $ of T. If 
_ A,, 45... lie in Z, A, lies in Z; for some countable 5,, and so U „A, lies 
in ¥ because it lies in A, for S= U,S,. Thus F is a o-field, and since it 
contains the sets [ X, € H], it contains the o-field F they generate. (This part 
of the argument was used in the second proof of the existence theorem.) E 


From this theorem it follows that various important sets lie outside the 
class 2”. Suppose that T = [0, °). Of obvious interest is the subset C of RT 
consisting of the functions continuous over [0,o»)). But C is not in ^. For 
suppose it were. By part (ii) of the theorem (let Q = R" and put [Z,: t € T] in 
the role of [X,: t€ T), C would lie in o[Z,: t € S] for some countable 
S C [0, o»). But then by part (i) of the theorem (let Q = R^ and put [Z,: t € S$] 
in the role of [X,: t€ T], if x eC and Z,(x) = Z(y) for all t € S, then 
y € C. From the assumption that C lies in A’ thus follows the existence of a 
countable set S such that, if x € C and x(t) = y(t) for all t in S, then y € C. 
But whatever countable set $ may be, for every continuous x there obviously 
exist functions y that have discontinuities but agree with x on S. Therefore, 
cannot lie in A’. 

What the argument shows is this: A set A in R^ cannot lie in ^ unless 
there exists a countable subset S of T with the property that, if x € À and 
x(t)=y(t) for all t in S, then y € JA. Thus A cannot lie in ZT if it 
effectively involves all the points ¢ in the sense that, for each x in A and 
each t in T, it is possible to move x out of A by changing its value at t alone. 
And C is such a set. For another, consider the set of functions x over 

= [0, œ) that are nondecreasing and assume as values x(t) only nonnegative 
integers: 


(3626) [r ERCA; x(s) &x(1), Xs t; x(Que Oud, OE 


This, too, lies outside A’. 

In Section 23 the Poisson process was defined as follows: Let X, X>,... 
be independent and identically distributed with the exponential distribution 
(the probability space () on which they are defined may by Theorem 20.4 be 
taken to be the unit interval with Lebesgue measure) Put $,-0 and 
S, =X, + cc +X, If S Go) € S, (0) for n= 0 and S, (9) > », put N(t, w) 
= N(w) = max[n: S,(w) € t] for t > 0; otherwise, put N(t, w) = N (w) = 0 for 
t> 0. Then the stochastic process [N,: í x 0] has the finite-dimensional 
distributions described by the equations (23.27). The function N(:, w) is the 
path function or sample function! corresponding to «, and by the construc- 
tion every path function lies in the set (36.26). This is a good thing if the i 


'Other terms are realization of the process and trajectory. I : 
y 


494 STOCHASTIC PROCESSES 


process is to be a model for, say, calls arriving at a telephone exchange: The 
sample path represents the history of the calls, its value at zt being the 
number of arrivals up to time t, and so it ought to be nondecreasing and 
integer-valued. 

According to Theorem 36.1, there exists a measure P on RT for T = [0, °) 
such that the coordinate process [Z,: t > 0] on (R’,. 2", P) has the finite- 
dimensional distributions of the Poisson process. This time does the path 
function Z(:, x) lie in the set (36.26) with probability 1? Since Z(-, x) is just 
x itself, the question is whether the set (36.26) has P-measure 1. But this set 
does not lie in 27, and so it has no measure at all. 

An application of Kolmogorov’s existence theorem will always yield a 
stochastic process with prescribed finite-dimensional distributions, but the 
process may lack certain path-function properties that it is reasonable to 
require of it as a model for some natural phenomenon. The special construc- 
tion of Section 23 gets around this difficulty for the Poisson process, and in 
the next section a special construction will yield a model for Brownian 
motion with continuous paths. Section 38 treats a general method for 
producing stochastic processes that have prescribed finite-dimensional distri- 
butions and at the same time have path functions with desirable regularity 
properties. 


A Return to Ergodic Theory * 


Write R”, 25, R for RT, 4, PT in the case where the index set {0, + 1, +2,...} 
consists of all the integers. Then R” is analogous to S” (Sections 2 and 24), except 
that here the sequences are doubly infinite: 


P Elier Zyl %) Zoo) Zale eR) 


Let T (not an index set) denote the shift: Z,(Tx) =Z,, (x), k= 0, +1,.... This is 
like the shift in Section 24. Since A € Zy implies T !4 € Z, T is measurable 
R” / 2^. Clearly, it is invertible. 

For a stochastic process X = (..., X ,, Xo, X,,...) on (Q, F, P), define £: Q > R° 
by (36.14): £v = X(w) 7 C..., X (w), Xw), X (w),...). The measure P£ `! = PX `! 
on (R*, Z”) can be viewed as the distribution of X. Suppose that X is stationary in 
the sense that, for each k > 1 and H € 24, P[( X, ,,,..., X,,,) € H] is the same for 
all n — 0,  1,.... Then the shift preserves P£ `! (use (36.16) and Lemma 1, p. 311). 
The process X is defined to be ergodic if under P£ `! the shift is ergodic in the sense 
of Section 24. 

In the ergodic case, it follows by the ergodic theorem that 


(36.27) PE f(T*x) > f. f(x) PE (ax) 
-1 


*This topic, which requires Section 24, may be omitted. 


SECTION 36. KOLMOGOROV'S EXISTENCE THEOREM 495 


on a set of P£ '-measure 1, provided f is measurable 2” and integrable. Carry 
(36.27) back to (Q, F, P) by the inverse set mapping £ ` !. Then 


1 n 
(3628) — L fC. Xo Xo Xp) > BL feo a XX, Xir) 
k=1 


with probability 1: (36.28) holds at w if and only if (36.27) holds at x = £w = X(w). It 
is understood that on the left in (36.28), X, is the center coordinate (the Oth 
coordinate) of the argument of f, and on the right, X, is the center coordinate: For 
stationary, ergodic X and integrable f, (36.28) holds with probability 1. 

If the X, are independent, then the Z, are independent under P£ !. In this case, 
lim, P£ (An T "B)- P£ NAPE (B) for A and B in 20, because for large 
enough n the cylinders A and T "B depend on disjoint sets of time indices and 
hence are independent. But then it follows by approximation (Corollary 1 to Theorem 
11.4) that the same limit holds for all A and B in 2”. But for invariant B, this 
implies P£ (B° n B) = P£ (B*)P£ (B), so that P£ ^! (B) is 0 or 1, and the shift is 
ergodic under P~}: If X is stationary and independent, then it is ergodic. 

If f depends on just one coordinate of x, then (36.28) is in the independent case a 
consequence of the strong law of large numbers, Theorem 22.1. But (36.28) follows by 
the ergodic theorem even if f involves all the coordinates in some complicated way. 

Consider now a measurable real function @ on R°. Define y: R^ > R° by 


(x) =(...,¢(T'x), 0(x). 6(1x),...); 


here d(x) is the center coordinate: Z,(J(x)) = ó(T x). It is easy to show that y is 
measurable &*/A* and commutes with the shift in the sense of Example 24.6. 
Therefore, T preserves P£ !J ! if it preserves P£ !, and it is ergodic under 
P£-!y-! if it is ergodic under P£  !. 

This translates immediately into a result on stochastic processes. Define Y — 
(...; Y. 4,Y5,,, 0-9 dn'terms! of X qby 


(36.29) Y. =%(..., X, 


n—1? 


MG Mates 999); 


that is to say, Y(o) = W(X(w)) = v£o. Since P£ `! is the distribution of X, P£ ` 2 
= P(g£) ! = PY! is the distribution of Y: 


Theorem 36.4. If X is stationary and ergodic, in particular if the X„ are indepen- 
dent and identically distributed, then Y as defined by (36.29) is stationary and ergodic. 


This theorem fails if Y is not defined in terms of X in a time-invariant way—if the 
ó in (36.29) is not the same for all n: If $,(x) = Z (x) and ¢ is replaced by 4, in 
(36.29), then Y, = Xç; in this case Y happens to be stationary, but it is not ergodic if 
the distribution of X, does not concentrate at a single point. 


Example 36.5. The autoregressive model. Let $(x) = X; .9B*Z (x) on the set 
where the series converges, and take (x)= 0 elsewhere. Suppose that |8| < 1 and 
that the X, are independent and identically distributed with finite second moments. 
Then by Theorem 22.6, Y, = Y;.98^X, , converges with probability 1, and by 
Theorem 36.4, the process Y is ergodic. Note that Y, , = 8Y, + X,,, and that X 


is independent of Y,. This is the linear autoregressive model of order 1. "a 


496 STOCHASTIC PROCESSES 


The Hewitt—Savage Theorem* 


Change notation: Let (R*, 2”) be the product space with (1,2,...) as the index set, 
the space of one-sided sequences. Let P be a probability measure on 4". If the 
coordinate variables Z, are independent under P, then by Theorem 22.3, P(A) is 0 
or 1 for each A in the tail o-field Z. If the Z, are also identically distributed under 
P, a stronger result holds. 

Let .Z be the class of .Z"-sets A that are invariant under permutations of the 
first n coordinates: if m is a permutation of (1,..., n}, then x lies in A if and only if 
(Za. ZZ, + CX) ...) does, Then 4% is a o-feld. Let (1,27, be 
the o-field of .4"-sets invariant under all finite permutations of coordinates. Then 7 
is larger than J, since, for example, the x-set where X7 .,Z,(x) > c, infinitely often 
lies in .Z but not in Z. 

The Hewitt-Savage theorem is a zero-one law for .7/ in the independent, identi- 
cally distributed case. 


Theorem 36.5. If the Z, are independent and identically distributed under P, then 
P(A) is 0 or 1 for each A in .Z. 


Proor. By Corollary 1 to Theorem 11.4, there are for given A and e an n and a 
st U-[(Z,...,Z)€H] (H € €") such that P(AsU)<e. Let V — 
[(Z, ,,....,Z4,) € H]. If the Z, are independent and identically distributed, then 
P(A A U) is the same as 


P, ZZ 422.1» Z2n427--:) eA] 


Ze gts Zoo Zt) SHARAN) 


But if A € .Z,,, this is in turn the same as P(A a V). Therefore, P(A a U) = P(A a V). 

But then, P(A4V)<e and P(Aa (UNV) x P(AAU)+ P(A AV) «2e. Since U 
and V have the same probability and are independent, it follows that P(A) is within 
e of P(U) and hence P?(A) is within 2e of P*(U) = P(U)P(V) = P(U n V), which is 
in turn within 2e of P(A). Therefore, |P?(A) — P(A)| € 4e for all e, and so P(A) 
must be 0 or 1. C 


PROBLEMS 


36.1. 1 Suppose that [X,: t € T] is a stochastic process on (Q, F, P) and A € F. 
Show that there is a countable subset S$ of T for which P[A||X,, t€ T] = 
P[ All X,, t € S] with probability 1. Replace A by a random variable and prove 
a similar result. 


36.2. Let T be arbitrary and let K(s, t) be a real function over T x T. Suppose that 
K is symmetric in the sense that K(s, 1) ^ K(t,s) and nonnegative-definite in 
the sense that Df |, K(t, t)x;x, z 0 for k 2 1, t,,..., t, in T, and xy,..., x, 
real. Show that there exists a process [X,: t € T] for which (X,, Wo Ae.) has 


the centered normal distribution with covariances K(t;, t), i, j =1,..., k. 


*This topic may be omitted. 


SECTION 36. KOLMOGOROV'S EXISTENCE THEOREM 497 


36.3. Let L be a Borel set on the line, let .Z/ consist of the Borel subsets of L, and 


! 36.4. 


36.5. 


36.6. 


let L! consist of all maps from T into L. Define the appropriate notion of 
cylinder, and let -ZT be the o-field generated by the cylinders. State a version 
of Theorem 36.1 for (LT, ZT). Assume T countable, and prove this theorem 
not by imitating the previous proof but by observing that L^ is a subset of R 
and lies in 2”. 


Suppose that the random variables X,, X,,... assume the values 0 and 1 and 
P[ X, =1 i.o.] = 1. Let u be the distribution over (0,1] of X7 X, / 2". Show 
that on the unit interval with the measure p, the digits of the nonterminating 
dyadic expansion form a stochastic process with the same finite-dimensional 
distributions as X}, X5,.... 


36.31 There is an infinite-dimensional version of Fubini's theorem. In the 
construction in Problem 36.3, let L =/ = (0,1), T = (1,2,...), let Y consist of 
the Borel subsets of J, and suppose that each k-dimensional distribution is the 
k-fold product of Lebesgue measure over the unit interval. Then IT is a 
countable product of copies of (0, 1), its elements are sequences x = (x, x5,...) 
of points of (0, 1), and Kolmogorov's theorem ensures the existence on (17, 47) 
of a product probability measure m: m[x: x;<a;, ixn]-2a,-:::a, for 
0 x a; x 1. Let I" denote the n-dimensional unit cube. 

(a) Define y: I" x I —Ə— IT by 


b x... Xn) (Yi yo---)) = (era aap pai 


Show that y is measurable S" x .Z7/.7' and y ! is measurable .ZT /.Z" 
x ZT. Show that y !(A, X) 2, where A, is n-dimensional Lebesgue 
measure restricted to I”. 


(b) Let f be a function measurable 4% T and, for simplicity, bounded. Define 


l l 
fs Ragae m f f FO aza sI a 


in other words, integrate out the coordinates one by one. Show by Problem 
34.18, martingale theory, and the zero-one law that 


(36.30) fii Xna) > f FOC) 


except for x in a set of m-measure 0. 

(c) Adopting the point of view of part (a), let g,(x;,..., x,) be the result of 
integrating the variable (y,,,,Y,425,...) out (with respect to T) from 
f(x i,..., Xu Yna ps). This may suggestively be written as 


| sl 
gsx) f f t fla, a Ben yy te tay SM 9, 228 


Show that g,(x,,..., x,) > f(x), x5,...) except for x in a set of 7-measure 0. 


(a) Let T be an interval of the line. Show that #7 fails to contain the sets of: 
linear functions, polynomials, constants, nondecreasing functions, functions of 
bounded variation, differentiable functions, analytic functions, functions con- 


498 


36.7. 


36.8. 


STOCHASTIC PROCESSES 


tinuous at a fixed tọ, Borel measurable functions. Show that it fails to contain 
the set of functions that: vanish somewhere in 7, satisfy x(s) < x(t) for some 
pair with s < t, have a local maximum anywhere, fail to have a local maximum. 


(b) Let C be the set of continuous functions on T = [0, œ). Show that A € .@T 
and ACC imply that A = Ø. Show, on the other hand, that A € .@T and 
C CA do not imply that A = R". 


Not all systems of finite-dimensional distributions can be realized by stochastic 
processes for which Q is the unit interval. Show that there is on the unit 
interval with Lebesgue measure no process [X,: t > 0] for which the X, are 
independent and assume the values 0 and 1 with probability i each. Compare 
Problem 1.1. 


Here is an application of the existence theorem in which T is not a subset of 
the line. Let (N, J//, v) be a measure space, and take T to consist of the .“sets 
of finite v-measure. The problem is to construct a generalized Poisson process, 
a stochastic process [ X,: A € T] such that (i) X 4 has the Poisson distribution 
with mean v(A) and (ii) X,,..., X, are independent if A,,...,A, are 
disjoint. Hint: To define the finite-dimensional distributions, generalize this 
construction: For A, B in T, consider independent random variables Y, Y;, Y; 
having Poisson distributions with means v(A N B°), v(A N B), v A° n B); take 
4. 4. g to be the distribution of (Y, + Y, Y, + Y3). 


SECTION 37. BROWNIAN MOTION 


Definition 


A Brownian motion or Wiener process is a stochastic process [W,: t > 0], on 
some (Q, Y, P), with these three properties: 


(i) The process starts at 0: 
(37.1) P[W., = 0] = 1. 


(ii) The increments are independent: If 


(372) ESTE =; s. 


then 


(37.3) P[W,—W,,_,€H,, isk] = IIP[W, - W. eH]. 


ixk 


(ii) For 0 <s < t the increment W, — W, is normally distributed with mean 
O and variance t — s: 


(37.4) P[W, -W, e H] = 


The existence of such processes will be proved. (d 
Imagine suspended in a fluid a particle bombarded by nole 
thermal motion. The particle will perform a seemingly random 1 


1 f e ma 


v2T(t —s) "u 


SECTION 37. BROWNIAN MOTION 499 


first described by the nineteenth-century botanist Robert Brown. Consider a 
single component of this motion—imagine it projected on a vertical axis—and 
denote by W, the height at time ¢ of the particle above a fixed horizontal 
plane. Condition (i) is merely a convention: the particle starts at 0. Condition 
(ii) reflects a kind of lack of memory. The displacements Wi Wei Wi, 
— W, , the particle undergoes during the intervals [to, ti], ...,[t, 25, t, .,] in 
no way influence the displacement W, — W, _ it undergoes during [5 p fu 
Although the future behavior of the particle depends on its present position, 
it does not depend on how the particle got there. As for (iii), that W, — W, 
has mean 0 reflects the fact that the particle is as likely to go up as to go 
down—there is no drift. The variance grows as the length of the interval 
[s, t]; the particle tends to wander away from its position at time s, and 
having done so suffers no force tending to restore it to that position. To 
Norbert Wiener are due the mathematical foundations of the theory of this 
kind of random motion. 


A Brownian motion path. 


The increments of the Brownian motion process are stationary in the 
sense that the distribution of W, — W, depends only on the difference t — s. 
Since W, — 0, the distribution of these increments is described by saying that 


500 STOCHASTIC PROCESSES 


W, is normally distributed with mean 0 and variance t. This implies (37.1). If 
0 <s « t, then by the independence of the increments, E[W,W,] = E[(W,(W, 
— W,)] + E[W?] = E(W,]E(W, - W,] + EW?] =s. This specifies all the 
means, variances, and covariances: 


(375)  E[wW]-0, |E[w?|-t, | E[W,W,] min(s,t). 


If 0<t,< +++ «t,, the joint density of (W,, W,, — W,,...,W, — W,,_,) is 
by (20.25) the product of the corresponding normal densities. By the Jacobian 
formula (20.20), (W, , ... , W, ) has density 


bsc | MCEEZE 
(37.6) fi aree) IL | 2( tj = t, 1) | 


where fy = xo = 0. 

Sometimes W, will be denoted W(t), and its value at w will be W(t, c). 
The nature of the path functions W(-, w) will be of great importance. 

The existence of the Brownian motion process follows from Kolmogorov’s 
theorem. For O <t, < --- <t, let uw, be the distribution in R* with 
density (37.6). To put it another way, let u, ,, be the distribution of 
($,,...,$,), Where S,=X,+--- +X; and where X,,..5 X, ate indepem- 
dent, normally distributed random variables with mean 0 and variances 
Lolo th=. ty ity ge TE gx L. i. x,.)= (xul. x b eee eee 
g(S,,-..-..5,) =(G,,..-, S;_,, 5;41,-+.,5,) has the distribution prescribed for 
Hass uuu This is because X, + X;,, is normally distributed with mean 0 
and variance /;,, — t;_,; see Example 20.6. Therefore, 


(37.7) Buy ny WO teo 


The p, ,, defined in this way for increasing, positive f),...,f, thus 
satisfy the conditions for Kolmogorov's existence theorem as modified in 
Example 36.4; (37.7) is the same thing as (36.19). Therefore, there does exist 
a process [W,: t > 0] corresponding to the Mana Taking W, = 0 for t= 0 
shows that there exists on some ((),.Z, P) a process [W,: t > 0] with the 
finite-dimensional distributions specified by the conditions (i), (i), and (iii). 


Continuity of Paths 


If the Brownian motion process is to represent the motion of a particle, it is 
natural to require that the path functions W(-,w) be continuous. But 
Kolmogorov's theorem does not guarantee continuity. Indeed, for T = [0, o), 
the space (Q, F) in the proof of Kolmogorov's theorem is (RT, 47), and as 
shown in the last section, the set of continuous functions does not lie in 27. 


SECTION 37. BROWNIAN MOTION 501 


A special construction gets around this difficulty. The idea is to use for 
dyadic rational ¢ the random variables W, as already defined and then to 
redefine the other W, in such a way as to ensure continuity. To carry this 
through requires proving that with probability 1 the sample path is uniformly 
continuous for dyadic rational arguments in bounded intervals. 

Fix a space (Q, F, P) and on it a process [W,: t > 0] having the finite- 
dimensional distributions prescribed for Brownian motion. Let D be the set 
of nonnegative dyadic rationals, let 7,, =[k2~",(k + 2)2~"], and put 


M,,.(@)= sup |W(r,w) —-W(k2-",o)| 
ren rae 


M = max M , 
(0) = max M,,(e) 


(37.8) 


Suppose it is shown that X: P[M, > n !] converges. The first Borel—Cantelli 
lemma will then imply that B=[M, >n ! io. has probability 0. But 
suppose o lies outside B. Then for every t and e there exists an n such that 
t«n,2n ! «e, and M(w)<n_'. Take 8 = 2^". Suppose that r and r' are 
dyadic rationals in [0,7] and |r—r'|«ó. Then r and r’ must for some 
k < n2" lie in a common interval J, (length 2 x 2^"), in which case |W(r, œ) 
— W(r', e) x 2M, (v) x 2M, (o) x 2n! < e. Therefore, o € B implies that 
W(r,w) is for every t uniformly continuous as r ranges over the dyadic 
rationals in [0, z], and hence it will have a continuous extension to [0, oo). 

To prove XP[M,»n !]«*, use Etemadi's maximal inequality (22.10), 
which applies because of the independence of the increments. This, together 
with Markov's inequality, gives 


P| max |W(1 + Oe) W(1)|>a] 
Hess Due 


die t6i27") -W(t)| x a/3] 


a ^3 — SEO +a) - way] => 382 = is 


(see (21.7) for the moments of the normal distribution). The sets on the left 
- here increase with m, and letting m — œ leads to 


x K8? 
- (37.9) P| sup |W(t* r8) - W(t)|> a <— 


0zrzl 
reD 
Therefore, 
K(2x27y  4Kp 
CR NIC o 


P[M,»n^!| xn2" 


and X P[ M, > n '!] does converge. 


502 STOCHASTIC PROCESSES 


Therefore, there exists a measurable set B such that P(B) = 0 and such 
that for w outside B, W(r,w) is uniformly continuous as r ranges over the 
dyadic rationals in any bounded interval. If w € B and r decreases to t 
through dyadic rational values, then W(r,w) has the Cauchy property and 
hence converges. Put 


lmW(r,w) ifw EB, 
MRON E MEn T 
0 if wEB, 


where r decreases to t through the set D of dyadic rationals. By construc- 
tion, W'(t, œ) is continuous in t for each o in Q. If w € B, then W(r, w) = 
W'(r,«) for dyadic rationals, and W’(-, œw) is the continuous extension to all 
of [0, æ). 

The next thing is to show that the W/ have the same joint distributions as 
the W.. It is convenient to prove this by a lemma which will be used again 
further on. 


Lemma 1. Let X, and X be k-dimensional random vectors, and let F, (x) 
be the distribution function of X,. If X, > X with probability 1 and F, (x) = 
F(x) for all x, then F(x) is the ditibution function of X. 


Proor.' Let X have distribution function H. By two applications of 
Theorem 4.1, if h > 0, then 


Feu m ue limsup F (xr xa) = Ce eee) 
<liminfF,(x, "up, nya eh) 
=F x h yr pp 
It follows by continuity from above that F and H agree. m 


Now, for 0 <t, < +-+- <t,, choose dyadic rationals r,(n) decreasing to the 
t; Apply D 1 with (W, iny: W, ny) and (W/,..., W) in the roles of 
X, and X, and with the distriborion function with dire (35. 6) in the role of 
F. Since (37.6) is continuous in the /;, it follows by Scheffé's theorem that 
F,(x) — F(x), and by construction X, X with probability 1. By the lemma, 
(W/ Ww) has distribution function F, which of course is also the distribu- 
tion WA Gf G, W, y, 

Thus [W/: t 2 0] is a PETS process, on the same probability space as 
[W,, t 2 0], which has the finite-dimensional distributions required for Brown- 
ian motion and moreover has a continuous sample path W'(-, w) for every o. 


*The lemma is an obvious consequence of the weak-convergence theory of Section 29; the point 
of the special argument is to keep the development independent of Chapters 5 and 6. 


SECTION 37. BROWNIAN MOTION 503 


By enlarging the set B in the definition of W,(w) to include all the w for 
which W(0, w) # 0, one can also ensure that W'(0, #) = 0. Now discard the 
original random variables W, and relabel W’ as W,. The new [W,: t > 0] is a 
stochastic process satisfying conditions (i), (ii), and (iii) for Brownian motion 
and this one as well: 


(iv) For each w, W(t, w) is continuous in t and W(0, w) = 0. 


From now on, by a Brownian motion will be meant a process satisfying (iv) 
as well as (i), Gi), and (iii). What has been proved is this: 


Theorem 37.1. There exist processes [W,: t > 0] satisfying conditions (i), 
(ii), Gi), and (iv)—Brownian motion processes. 


In the construction above, W. for dyadic r was used to define W, in 
general. For that reason it suffices to apply Kolmogorov's theorem for a 
countable index set. By the second proof of that theorem, the space (Q, F, P) 
can be taken as the unit interval with Lebesgue measure. 

The next section treats a general scheme for dealing with path-function 
questions by in effect replacing an uncountable time set by a countable one. 


Measurable Processes 

Let T be a Borel set on the line, let [X,: t € T] be a stochastic process on an 
(Q, Z, P), and consider the mapping 

(37.10) (t, 9) 2 X,(v) = X(t, o) 


carrying T x Q into R'. Let Z be the o-field of Borel subsets of T. The 
process is said to be measurable if the mapping (37.10) is measurable 
Zx Z, 

In the presence of measurability, each sample path X(-, w) is measurable 
J by Theorem 18.1. Then, for example, /Pg(X(t,«)) dt makes sense if 
(a,b) C T and ç is a Borel function, and by Fubini's theorem 


b b A rb 
gf P (XG) di = lE[e(x,)] dc. it f’E[le( Xl] dt < >. 
a a a 
Hence the usefulness of this result: 


Theorem 37.2. Brownian motion is measurable. 


Proor. If 


W(t, o) 2 W(k27",o) for 


504 STOCHASTIC PROCESSES 


then the mapping (t,w) —^ W“"(t,w) is measurable 7X F. But by the 
continuity of the sample paths, this mapping converges to the mapping 
(37.10) pointwise (for every (t,w)), and so by Theorem 13.4(ii) the latter 
mapping is also measurable .Z x .2/.!. a 


Irregularity of Brownian Motion Paths 


Starting with a Brownian motion [W,: t > 0] define 
(37.11) W/(w) =c_ Wa(o), 


where c > 0. Since t > c?t is an increasing function, it is easy to see that the 
process [W/: t>0] has independent increments. Moreover, W/ — W; = 
c (QW, — W.2,), and for s < t this is normally distributed with mean 0 and 
variance c ?(c?t — c?s) = t — s. Since the paths W'(-, w) all start from 0 and 
are continuous, [W/: t > 0] is another Brownian motion. In (37.11) the time 
scale is contracted by the factor c?, but the other scale only by the factor c. 

That the transformation (37.11) preserves the properties of Brownian 
motion implies that the paths, although continuous, must be highly irregular. 
It seems intuitively clear that for c large enough the path W(-, œw) must with 
probability nearly 1 have somewhere in the time interval [0, c] a chord with 
slope exceeding, say, 1. But then W’(-,w) has in [0, c ` !] a chord with slope 
exceeding c. Since the W/ are distributed as the W,, this makes it plausible 
that W(-,w) must in arbitrarily small intervals [0,8] have chords with 
arbitrarily great slopes, which in turn makes it plausible that W(-, w) cannot 
be differentiable at 0. More generally, mild irregularities in the path will 
become ever more extreme under the transformation (37.11) with ever larger 
values of c. It is shown below that, in fact, the paths are with probability 1 
nowhere differentiable. 

Also interesting in this connection is the transformation 


W,,(w) if t>0, 


W” (o) = 
ia) eio) 0 if (= 0. 
Again it is easily checked that the increments are independent and normally 
distributed with the means and variances appropriate to Brownian motion. 
Moreover, the path W”(:, w) is continuous except possibly at t = 0. But (37.9) 
holds with W,” in place of W, because it depends only on the finite-dimen- 
sional distributions, and by the continuity of W"(-,w) over (0,o) the 
supremum is the same if not restricted to dyadic rationals. Therefore, 
P{sup,-,—1W,"|>n~']<K/n’, and it follows by the first Borel-Cantelli 
lemma that W"(:,«) is continuous also at O for w outside a set M of 
probability 0. For w € M, redefine W"(t, w) = 0; then [W/: t > o isa 2 Poor 
ian motion and (37.12) holds with probability 1. 


SECTION 37. BROWNIAN MOTION 505 


The behavior of W(-,w) near 0 can be studied through the behavior of 
W”(:, e) near o and vice versa. Since (W,” — W?)/t = Wiji W”(-, œ) cannot 
have a derivative at 0 if W(-,w) has no limit at o. Now, in fact, 


(37.19) inf W, = — o, sup W, = +0 


n 
with probability 1. To prove this, note that W, = X, + ++: +X,, where the 


X, = W, — W, _, are independent. Consider 


sup W, < œ| = U f) maxW, «u|; 


i<m 


u=] m=1 


this is a tail set and hence by the zero-one law has probability 0 or 1. Now 
-X,, —X,,... have the same joint distributions as X,, X5,..., and so this 
event has the same probability as 


inf W, > -«| = U f) [max(-W) <u}. 

" u=1m=1'!'sm 

If these two sets have probability 1, so has [sup,|W,,| < œ], so that P[sup,|W,]| 
<x]> 0 for some x. But P[|W,|<x]=P[IW,|<x/n'/*]—0. This proves 
(37.13). 

Since (37.13) holds with probability 1, W"(-,«) has with probability 1 
upper and lower right derivatives of +œ and — o at t = 0. The same must be 
true of every Brownian motion. A similar argument shows that, for each fixed 
t, W(-,w) is nondifferentiable at ¢ with probability 1. In fact, W(-,@) is 
nowhere differentiable: 


Theorem 37.3. For w outside a set of probability 0, W(-, œ) is nowhere 
differentiable. 


Proor. The proof is direct —makes no use of the transformations (37.11) 
and (37.12). Let 
k+1 k k+2 k+1 
w(i - "(s ) (e) - "(s J 


k+3 k+2 
(5) - (Ap JJ 
By independence and the fact that the differences here have the distribution 


of 27"7W,, P[X,, < e] = P [|W,| < 2"7e]; since the standard normal den- 
sity is bounded by 1, P[X,, < e] x Q x 2"/e)’. If Y, = min, < nan Xak» then 


(37.14) . X; = max 


Lr! 


(37.15) PLY, < e] < n2"(2 x 2"7e*. oe 


S06 STOCHASTIC PROCESSES 


Consider now the upper and lower right-hand derivatives 


D”(t, œ) = lim sup 


W(t+h,w)- W(t, w) 
h0 h í 


Dy(t,w) = lim np tium) mna) 
h 10 


Define E (not necessarily in Z ) as the set of w such that D”(t,w) and 
Dy (t, «) are both finite for some value of t. Suppose that w lies in E, and 
suppose specifically that 


-K«Dy(t,o) <D"(t,w) «K. 


There exists a positive ó (depending on w, t, and K) such that t <s <t + ó 
implies |W(s, o) — W(t, w)| < K|s — t|. If n exceeds some n, (depending on ô, 
K, and t), then 


4x2 A KO 8K <n, P ti 


Given such an n, choose k so that (k — 1)2 < t <k2~". Then |i2 " — t| < ó 
for i=k,k+1,k+2,k+3, and therefore X,,(0) x 2K(4x2 ") «n2 ". 
Since k —1 12" «n2", Y (G) < n2 ^. 

What has been shown is that if w lies in E, then o lies in A, —[Y, < n2 ^] 
for all sufficiently large n: E c lim inf, A,. By (37.15), 


P(A,) x n2"(2 x 2^? x n2-")* > 0. 


By Theorem 4.1, lim inf, A, has probability 0, and outside this set W(-, œ) is 
nowhere differentiable—in fact, nowhere does it have finite upper and lower 
right-hand derivatives. (Similarly, outside a set of probability 0, nowhere does 
W(-,w) have finite upper and lower left-hand derivatives.) B 


If A is the set of w for which W(-, w) has a derivative somewhere, what 
has been shown is that ACB for a measurable B such that P(B) = 0; 
P(A)=0 if A is measurable, but this has not been proved. To avoid such 
problems in the study of continuous-time processes, it is convenient to work 
in a complete probability space. The space (Q, F, P) is complete (see p. 44) 
if ACB, Be F, and P(B)=0 together imply that A € F (and then, of 
course, P(A)= 0). If the space is not already complete, it is possible to 
enlarge F to a new ø-field and extend P to it in such a way that the new 
space is complete. The following assumption therefore entails no loss of 
generality: For the rest of this section the space (Q, F,P) on which the 
Brownian motion is defined is assumed complete. Theorem 37.3 now becomes: 
W(-,w) is with probability 1 nowhere differentiable. 


SECTION 37. BROWNIAN MOTION 507 


A nowhere-differentiable path represents the motion of a particle that at 
no time has a velocity. Since a function of bounded variation is differentiable 
almost everywhere (Section 31), W(-,w) is of unbounded variation with 
probability 1. Such a path represents the motion of a particle that in its 
wanderings back and forth travels an infinite distance in finite time. The 
Brownian motion model thus does not in its fine structure represent physical 
reality. The irregularity of the Brownian motion paths is of considerable 
mathematical interest, however. A continuous, nowhere-differentiable func- 
tion is regarded as pathological, or used to be, but from the Brownian-motion 
point of view such functions are the rule not the exception." 

The set of zeros of the Brownian motion is also interesting. By property 
(iv), t — 0 is a zero of W(-,w) for each w. Now [W/': t > 0] as defined by 
(37.12) is another Brownian motion, and so by (37.13) the sequence (W/': 
n=1,2,...] 2 (nWi ,,: n = 1,2,...) has supremum +œ and infimum — o for 
w outside a set of probability 0; for such an o, W(-, w) changes sign infinitely 
often near 0 and hence by continuity has zeros arbitrarily near 0. Let Z(o) 
denote the set of zeros of W(-,w). What has just been shown is that 
0 € Z(w) for each w and that 0 is with probability 1 a limit of positive points 
in Z(w). From (37.13) it also follows that Z(w) is with probability 1 un- 
bounded above. More is true: 


Theorem 37.4. The set Z(w) is with probability 1 perfect [A15], un- 
bounded, nowhere dense, and of Lebesgue measure 0. 


Proor. Since W(-,«) is continuous, Z(w) is closed for every o. Let A 
denote Lebesgue measure. Since Brownian motion is measurable (Theorem 
37.2), Fubini's theorem applies: 


A(Z(w)) P(do) = (À X P)[( 0): W(t, w) = 0] 
O 


- J Plo: W(t.) = Q dt 0, 


Thus A(Z(w)) = 0 with probability 1. 

. If W(-,w) is nowhere differentiable, it cannot vanish throughout an 

interval / and hence must by continuity be nonzero throughout some subin- 

terval of J. By Theorem 37.3, then, Z(w) is with probability 1 nowhere dense. 
It remains to show that each point of Z(w) is a limit of other points of 

Z(w). As observed above, this is true of the point 0 of Z(w). For the general 

point of Z(w), a stopping-time argument is required. Fix r> 0 and let 


t ^ 4 
For the construction of a specific example, see Problem 31.18. d 3d 


508 STOCHASTIC PROCESSES 


T(w) = inf[t: t > r, W(t, w) = 0]; note that this set is nonempty with probabil- 
ity 1 by (37.13). Thus r(w) is the first zero following r. Now 


[w: r(w) <t] = |»: „inf |W(s,o)| = 0 j 


and by continuity the infimum here is unchanged if s is restricted to 
rationals. This shows that 7 is a random variable and that 


[w: r(w) <t] €e[W,: u < t]. 


A nonnegative random variable with this property is a stopping time. 
To know the value of 7 is to know at most the values of W, for u <r. 


Since the increments are independent, it therefore seems intuitively clear 
that the process 


(37.16) W,*(@) E Wawas) i Wy) T Www), f2 0, 


ought itself to be a Brownian motion. This is, in fact, true by the next result, 
Theorem 37.5. What is proved there is that the finite-dimensional distribu- 
tions of [W,*: t>0] are the right ones for Brownian motion. The other 
properties are obvious: W*(-, w) is continuous and vanishes at 0 by construc- 
tion, and the space on which [W,*; t > 0] is defined is complete because it is 
the original space (Q, F, P), assumed complete. 

If [W,*: t > 0] is indeed a Brownian motion, then, as observed above, for w 
outside a set B, of probability 0 there is a positive sequence {t,} such that 
t, > 0 and W*(t,, w) = 0. But then W(7(w) + t,, œ) = 0, so that 7(«), a zero 
of W(-, o), is the limit of other larger zeros of W(-,«). Now r(w) was the 
first zero following r. (There is a different stopping time 7 for each r, but 
the notation does not show this.) If B is the union of the B, for rational r, 
the first point of Z(w) following r is a limit of other, larger points of Z(w). 
Suppose that w € B and t € Z(w), where t > 0; it is to be shown that t is a 
limit of other points of Z(w). If t is the limit of smaller points of Z(w), there 
is of course nothing to prove. Otherwise, there is a rational r such that r <t 
and W(-,w) does not vanish in [r, t); but then, since w € B,, t is a limit of 
larger points s that lie in Z(w), This completes the proof of Theorem 37.4 
under the provisional assumption that (37.16) is a Brownian motion. m 


The Strong Markov Property 
Fix tọ > 0 and put 


(3747) Wi =W -Wp t>0. 


It is easily checked that [W/: t > 0] has the finite-dimensional distributions 


SECTION 37. BROWNIAN MOTION 509 


appropriate to Brownian motion. As the other properties are obvious, it is in 
fact a Brownian motion. 


Let 
(37.18) F,=a[W,: s «t]. 
The random variables (37.17) are independent of e, To see this, suppose 
that 0 <s, < `` xs; «tg and O < Z ::: <t,. Put u;—tg- t,. Since the 
increments are independent, (W/, W = Win ^ TUM Wa = (W, A 


W, Wa = Wassers Ways Wani ) is AKA taa of w, Mme i Oe 


W. y ). But then [t um a Wy) is independent of (w, 
Theorem 4. 2, Wis wi i is independent of F, Thus 


(3719) P([(W;,...,W) eH] nA) 
- P[(W7....,W;)eH]|P(A) 


0 


eP[(W,...,W,)eH|PCAy eA 


where the second equality follows because (37.17) is a Brownian motion. This 
holds for all H in &*. 

The problem now is to prove all this when f, is replaced by a stopping time 
7— a nonnegative random variable for which 


(37.20) [e:7T(e) xt] e. % ¢20. 


It will be assumed that 7 is finite, at least with probability 1. Since [r = t] = 
[rx t] - U [rx t —n ^! ], (37.20) implies that 


(57:245 [w: r(w) 2 t] E F, t > 0. 


The conditions (37.20) and (37.21) are analogous to the conditions (7.18) and 
(35.18), which prevent prevision on the part of the gambler. 

Now Z, contains the information on the past of the Brownian motion up ` 
to time fọ, and the analogue for 7 is needed. Let A consist of all 


measurable sets M for which 
(37.22) Mn[oe: r(o) xt] e F 


for all t. (See (35.20) for the analogue in discrete time.) Note that F isa - 
o-field and 7 is measurable Z. Since Mr [r =t] =M n[r =t]r [r < a a 


(37.23) Mn[e: r(o) = t] € 


for M in Æ. For example, += inf[t: W,— 1] is a stopping | 
[inf, .,W,» - l]isin Z. m 


S10 STOCHASTIC PROCESSES 
Theorem 37.5. Let 7 be a stopping time, and put 
(37.24) W,*(@) = Waos) = Way): t>0. 


Then [W,*: t 2] is a Brownian motion, and it is independent of F —that is, 
o[W,*: t > 0] is independent of F,: 


(3725) P([(W,*,...,W,*) eH] n M) 
- P[(W,...,W*)eH|P(M) =P[(W,,,....W,,) eH|P(M) 
for H in B® and M in Z. 
That the transformation (37.24) preserves Brownian motion is the strong 
Markov property.’ Part of the conclusion is that the W,* are random vari- 


ables. 


Proor. Suppose first that 7 has countable range V and let t, be the 
general point of V. Since 


[o: W*(w) €H] = D W, (9) -W, (v) € H, (o) - to]. 


W,* is a random variable. Also, 


P([(W*,....Wt) e H] OM) 


= Y, P([(W7..... We) eH] nMo(r- (]). 


tyEV 
If Me F, then Mn[r-ty] € F by (37.23). Further, if 7 = ty, then W,* 


coincides with W/ as defined by (37.17). Therefore, (37.19) reduces this last 
sum to 


yu P|(W,,...,W, )eH|P(Mo[r- ]) 
- P|[(W,,...,W,)eH]P(M). 


This proves the first and third terms in (37.25) equal; to prove equality with 
the middle term, simply consider the case M = Q.. 


*Since the Brownian motion has independent increments, it is a Markov process (see - 
33.9 and 33.10); hence the terminology. 


SECTION 37. BROWNIAN MOTION 511 
Thus the theorem holds if + has countable range. For the general 7, put 


k2*^ fF (R= iji nerak? eee, 


(37.26) 7, b LEN 
If k27” <t <(k + 1)2 "^then[v, St] -[r &k2-"] & 9,-, C > Thus each 
7, is a stopping time. Suppose that Me Æ and k2"«t«(k41) ". 
Then M n[*, <t] =M rY[* < k2 IEF EH, Tus e Let 
Wo) = W, (4, (v) — W, (,( o)—that is, let W be the Wi correspond- 
ing to the stopping time 7,. If Me Z then Me F, , and by an application 
of (37.25) to the discrete case already treated, 


(37.27) P([(W.,...,.W@) en] OM) =P[(W,,,....M,,) eH] PCM). 


But 7,(w)—>7(@) for each w, and by continuity of the sample paths, 
Ww) > W,*(w) for each w. Condition on M and apply Lemma 1 with 
(QW, 09, ..., W.) for Xn, (W,*,...,W,*) for X, and the distribution function of 
(W,,,..., W, ) for F = F,. Then (37.25) follows from (37.27). z 


The 7 in the proof of Theorem 37.4 is a stopping time, and so (37.16) is a 
Brownian motion, as required in that proof. Further applications will be 
given below. 

If F* =o[W,*: t 2 0], then according to (37.25) (and Theorem 4.2) the 
c-fields FY and F* are independent: 


(37.28) P(ANB)=P(A)P(B), AEF, Be F*. 


For fixed ¢ define +, by (37.26) but with 12^" in place of 2^" at each 
occurrence. Then [W, « x] [z x t] is the limit superior of the sets [W, <x] 
[7 < t], each of which lies in Y,. This proves that [W, <x] lies in F and 
hence that W. is measurable #. Since 7 is measurable F, 


(37.29) [(z,W.) e H] < Z 


for planar Borel sets H. 


The Reflection Principle 
For a stopping time 7, define 


W, if t<r, 


ag) gm 
Cae) P cim, Hae (Giana Rely ace 


The sample path for [W/': t > 0] is the same as the sample path for [W,: ¢ > 0] 
up to 7, and beyond that it is reflected through the point W.. See the figure. 


512 STOCHASTIC PROCESSES 


The process defined by (37.30) is a Brownian motion, and to prove it, one 
need only check the finite-dimensional distributions: P[(W,,...,W,) € H] = 
P\(W,",...,W,") € H]. By the argument starting with (37.26), it is enough to 
consider the case where 7 has countable range, and for this it is enough to 
check the equation when the sets are intersected with [7 = fo]. 

Consider for notational simplicity a pair of points: 


(37131) | P[r- t, (W,,W,) eH] =P[ r= to, (W,W?) €H]. 


If s xt X ty, this holds because the two events are identical. Suppose next 
that s < t, < t. Since [7 = to] lies in Z, , it follows by the independence of the 
increments, symmetry, and the definition (37.30) that 


P|[r= to, (W,,W, ) EL, W,- W, eJ] 
=P|[r = to, (W,,W,,) er, —(W,— W, ) eJ] 


=P|r=to, (WW) eL, Wf - W eJ]. 
If K=I XJ, this is 


P[r- to, (W,, W, W,- W.) EK] =P[ 7 = to, (W", Wt, Wi" — Wi") e K], 


o? 


and by z-A it follows for all K € 2°. For the appropriate K, this gives 
(37.31). The remaining case, t, < s < t, is similar. 

These ideas can be used to derive in a very simple way the distribution of 
M, = sup, <, W,. Suppose that x » 0. Let += inf[s: W, 2x], define W” by 
(37.30), and put 7" = inf[s: W” > x] and M; = sup, ., W”. Since 7" — 7 and 
W" is another Brownian motion, reflection through the point W. = x shows 


SECTION 37. BROWNIAN MOTION 513 


that 
P[M,>x]= P[* st] 
=P[r<t,W,<x]+P[r<t,W,>x] 
=P sth W xx] - P[T t,W, 2 x] 
= P|" st, Wen) +P[tstywW 2x) 
=P[r<t,W,=x)+P[7r<t,W,>x]=2P[W,=x]. 
Therefore, 
(37.32) PIM, zx] = =f" eH du. 


This argument, an application of the reflection principle, becomes quite 
transparent when referred to the diagram. 


Skorohod Embedding * 


Suppose that X,, X5,... are independent and identically distributed random 
variables with mean 0 and variance o”. A powerful method, due to Skoro- 
hod, of studying the partial sums $,— X,-- -:- +X, is to construct an 
increasing sequence To= 0, 7,,7,,... of stopping times such that W(7,) has 
the same distribution as S,. The differences 7, — 7, , will turn out to be 
independent and identically distributed with mean o, so that by the law of 
large numbers n !r, — n 1Y (z, — T,_1) is likely to be near o?. But if 7, 
is near no”, then by the continuity of Brownian motion paths W(,) will be 
near W(no?), and so the distribution of S, /avn , Which coincides with the 
distribution of Wr, )/avn , will be near the distribution of W(na?)/avn 
—that is, will be near the standard normal distribution. The method will thus 
yield another proof of the central limit theorem, one independent of the 
characteristic-function arguments of Section 27. 

But it will also give more. For example, the distribution of max, ., S, /avn 
is exactly the distribution of max, < „ W(v,) Jovn , and this in turn is near the 
distribution of sup, -< „g2 WP , Which can be written down explicitly 
because of (37.32). It will thus be possible to derive the limiting distribution 
of max, < „ $,. The joint behavior of the partial sums is closely related to the 
behavior of Brownian motion paths. 


The Skorohod construction involves the class .7 of stopping times for 
which 


(37.33) E[W, ] = 0, 
(37.34) E[7] = E[w7], 


'See Problem 37.18 for another application. EA 
"The rest of this section, which requires martingale theory, may be omitted. š m D 


514 STOCHASTIC PROCESSES 
and 
(37.35) E[7?] < 4E[|W?]. 

Lemma 2. All bounded stopping times are members of F. 


PRoor. Define Y, , = exp(0W, — $0?t) for all 0 and for t> 0. Suppose 
that s <t and A € &. Since Brownian motion has independent increments, 


] Y» , dP = f eW, -0:72 dP. E| e*Wi- Wo-ete-s2]. 
A A 


and a calculation with moment generating functions (see Example 21.2) 
shows that 


(37.36) J Y». aP - J. ap, sat, 4 


This says that for 0 fixed, [s.j t>0] is a continuous-time martingale 
adapted to the o-fields Z. It is the moment-generating-function martingale 
associated with the Brownian motion. 

Let f(@,t) denote the right side of (37.36). By Theorem 16.8, 


anf (0.1) = f Yo (W, — 61) dP, 
2 
af. O on? jas 
2 für) = f Ya |(W,- ot - ew, - oce se] dP. 
A 


Differentiate the other side of the equation (37.36) the same way and set 
0 — 0. The result is 


J W.ap = f W,aP, s<t, AEF, 
A 


J (W2-s)dP 


J (W: - 6W2s +35?) ap = f (Wt — 6W2r+30)dP, st, AES, 
A A 


Jf (W = t) dp, sst, AGS, 
A 


This gives three more martingales: If Z, is any of the three random variables 


(37.37) W.. Wi-n  Wi-69 


cs 
kà mA 
1 4 ot-. 


SECTION 37. BROWNIAN MOTION 515 


then Zo = 0, Z, is integrable and measurable .Z,, and 


(37.38) f z,ap = f z, dË, sl, AGW, 
A A 


In particular, E[Z,] = [Z,] = 0. 


If 7 is a stopping time with finite range {t,,...,¢,,} bounded by f, then 
(37.38) implies that 


E[Z,]- > J NL > J NZ dP = E[Z,] = 0 


Suppose that 7 is bounded by t but does not necessarily have finite range. 
Put 7, -k2 "t if (kK-1)2 "t<rsk2 "t, 1=k<2' and put. ou pu 
r= 0. Then rz, is a stopping time and E[Z, I= 0 For each of the three 
possibilities (37. 37) for Z Sup. [2 El integrable because of (37.32). It 
therefore follows by the dominated convergence theorem that E[Z.]— 
lim, EIZ, TW 

Thus EE )- 6 for every bounded stopping time 7. The three cases 
(37.37) give 


E[W,] = E[W? - :] - E|W* - 6W27 + 377] =0 
This implies (37.33), (37.34), and 


0- E[w?] - eE[W?s] + 35E[7?] 
> E[W?] - eg'2[w* | E'2[7?] + 3E[7?]. 


If C = E'7[W?] and x = E1⁄2[+2], the inequality is 0 > q(x) = 3x? — 6Cx + 
C?. Each zero of q is at most 2C, and q is negative only between these two 
zeros. Therefore, x « 2C, which implies (37.35). E 


Lemma 3. Suppose that r and c, are stopping times, that each T, is a 
member of F , and that v, — T with probability 1. Then r is a member of gru 
G) E[W/ ] < ELW’ ] <œ% for all n, or if (ii) the WÁ are uniformly integrable. 


Proor. Since Brownian motion paths are continuous, W, > W. with 
probability 1. Each of the two hypotheses (i) and (ii) implies that E(w, is 
bounded and hence that E[72] is bounded, and it follows (see (16.28)) that 
the sequences (7,), (W, ), and {W,?} are uniformly integrable. Hence (37.33) 
and (37.34) for 7 follow by Theorem 16.14 from the same relations for the 7,. 
The first hypothesis implies that lim inf, E[W/*] < E[W*], and the second 
implies that lim, E[W,*] = E[W/?]. In either case it follows by Fatou's lemma 
that El c?] < lim inf, Elz ] < 4 lim inf, E[W] x 4E[W.]. " 


516 STOCHASTIC PROCESSES 


Suppose that a, b > 0 and a + b > 0, and let z(a, b) be the hitting time for 
the set (—a, b): r(a, b) = inf[t: W, € (—a, b)]. By (37.13), t(a, b) is finite with 
probability 1, and it is a stopping time because 7(a, b) <t if and only if for 
every m there is a rational r <t for which W, is within m `! of —a or of b. 
From |W(min{r(a, b), n))| < max(a, b) it follows by Lemma 3(ii) that z(a, b) is 
a member of Z. Since Wa p) assumes only the values —a and b, EWW iia, by) 
= 0 implies that 


a 
(37539) Pian ale 555 PL Waa. ^] ep: 


This is obvious on grounds of symmetry in the case a = b. 

Let u be a probability measure on the line with mean 0. The program is to 
construct a stopping time 7 for which W. has distribution 4. Assume that 
u{0} < 1, since otherwise 7 = 0 obviously works. If u consists of two point 
masses, they must for some positive a and b be a mass of b/(a + b) at —a 
and a mass of a/(a + b) at b; in this case T, p is by (37.39) the required 
stopping time. The general case will be treated by adding together stopping 
times of this sort. 

Consider a random variable X having distribution u. (The probability 
space for X has nothing to do with the space the given Brownian motion is 
defined on.) The technique will be to represent X as the limit of a martingale 
X,,X>,... of a simple form and then to duplicate the martingale by 
W_,W_,,... for stopping times r,; the 7, will have a limit 7 such that W. has 
the same distribution as X. 

The first step is to construct sets 


Age By? sais d s aq 
and corresponding partitions 
= (=, a9], 
P: If m (avare 
PETN (at ? oo). 


Let M(H) be the conditional mean: 


M(H)- "CS if u( H) > 0. 


Let A, consist of the single point M(R!) = EL X] = 0, so that #, consists of 
i= ae 0] and J = (0,%). Suppose that A, and FH, are given. If a((I¢)°) 
> 0, split J by adding to A, the point "Mp, Which lies in Ri if 
n ((Ir)°) = 0, If appears again in På 


SECTION 37. BROWNIAN MOTION 517 


Let Z, be the o-field generated by the sets [X e Iz], and put X, = 
E[X||G,]. Then X,, X,,... is a martingale and X, = MC) on [X eI]. The 
X, have finite range, and their joint distributions can be written out explic- 


itly. In fact, [ X, = Mr, hwii Ky = MURI = IX ë Ik, ,..., X E I¢ ], and this 
<j is empty unless 7j 3 fte orp , in which case it is Lx, = Mg ))=[X € 
I? ]. Therefore, if k, , = j and [p om fa wh 


P[x, - MGE. n = MO) sa M )] = Pri 


and 


Ig 
-n s eee 


wkr) 


P|x, - M(17)|X, » M(1L),.... X 


n—1 


provided the conditioning event has positive probability. Thus the martingale 
(X,) has the Markov property, and if x=M(Ir D, u = Mt. 1), and v= 
M(IzZ), then the conditional distribution of X, given X,_,=x is concen- 
trated at the two points u and v and has mean x. The structure of ( X,) is 
determined by these conditional probabilities together with the distribution 


P[x,-M()]-»(4), — PD, =M(1)] = n( 1). 


of Xj. 

If J=o(U,,%,), then X,  E[X||-7] with probability 1 by the martingale 
theorem (Theorem 35.6). But, in fact, X, X with probability 1, as the 
following argument shows. Let B be the union of all open sets of w-measure 
0. Then B is a countable disjoint union of open intervals; enlarge B by 
adding to it any endpoints of u-measure 0 these intervals may have. Then 
U(B) — 0, and x € B implies that w(x — e, x] > 0 and u[x, x + e) > 0 for all 
positive e. Suppose that x = X(w) € B and let x, =X,(w). Let If be the 
element of Z, containing x; then x,,, = Mg) and J 11 for some 
interval /. Suppose that x,,, «x—«e for n in an infinite sequence N of 
integers. Then x,,, is the left endpoint of //*! for n in N and converges 
along N to the left endpoint, say a, of J, and (x — e, x]C 1. Further, 
X,41 7 M(I/ )— M(X) along N, so that M(1) — a. But this is impossible 
because "e — e, x] » 0. Therefore, x, 2 x — e for large n. Similarly, X, SE 
+e for large n, and so x, x. Thus X,(w) > X(o) if X(o) € B, the 
probability of which is 1. 

Now X, = E[ XII-%] has mean 0, and its distribution consists of point 
masses at —a = M(Jj}) and b = M(IJ). If 7, = r(a,b) is the hitting time to 


518 STOCHASTIC PROCESSES 


{—a, b), then (see (37.39)) 7, is a stopping time, a member of 7, and W, 
has the same distribution as X,. 

Let 7, be the infimum of those t for which t > 7, and W, is one of the 
points M(I2), 0 < k <r, + 1. By (37.13), 7, is finite with probability 1; it is a 
stopping time, because 7, < t if and only if for every m there are rationals r 
and s such that r<s+m ', r<t, s«t, W, is within m ! of one of the 
points Mj), and W, is within m ! of one of the points M(12). Since 
IW(min(7;, n))| is at most the maximum of the values |M(/7)|, it follows by 
Lemma 3(ii) that 7, is a member of Z. 

Define W,* by (37.24) with 7, for 7. If x -MGj ), then x is an endpoint 
common to two adjacent intervals 72 , and I; put u = MUg ) and v= 
M(I2). If We, =x, then u and v are the only possible values of Wey If 7* 
the first time the Brownian motion [W,*: t > 0] hits u —x or v = £, then by 
(37.39), 


xx. 
pert 


P[Wš=u-x]= ——, P[W3 -v-x]- 7 


On the set L = x], 7; coincides with 7, + 7*, and it follows by (37.28) that 


P[W, =x,W, =v] =P[W, =x, x+ WX =v] 


aa. | 


This, together with the same computation with u in place of v, shows that for 
W, =x the conditional distribution of W,, is concentrated at the two points 
u and v and has mean x. Thus the conditional distribution of W.. given W. 
coincides with the conditional distribution of X, given X,. Since W. and X, 
have the same distribution, the random vectors (W, , W, ) and (X,. 23 am 
have the same distribution. 

An inductive extension of this argument proves the existence of a se- 
quence of stopping times 7, such that 7, <T} € *:: , each r, is a member of 
JZ, and for each n, W... W, have the same joint distribution as 
X;,,..., X,. Now suppose that X has finite variance. Since r, is a member of 
T, Fir, J = E[W/] = E(X} ] = E[ E?[ X|I-£]] < E[ X2] by Jensen's inequality 
(34.7). Thus + = lim, 7, is finite with probability 1. Obviously it is a stopping 
time, and by path continuity, W, — W, with probability 1. Since X, — X with 
probability 1, it is a consequence of ‘the following lemma that W, has the 
distribution of X. 


Lemma 4. /f X,,— X and Y, > Y with probability 1, and if X, and Y, have 
the same distribution, then so do X and Y. 


SECTION 37. BROWNIAN MOTION 519 


T 


Proor.' By two applications of (4.9), 


P[X <x] <P[X <x+e] «lim inf P[ X, «x * e] 
n 


< lim supP[Y, «x +e] zP[Y xx + e]. 
n 


Let e > 0: P[X xx] < P[Y < x]. Now interchange the roles of X and Y. m 


Since X2 < E[X?||-£], the X, are uniformly integrable by the lemma 
preceding Theorem 35.6. By the monotone convergence theorem and Theo- 
rem 16.14, E[7] = lim, E[7,] = lim, E(W,?] = lim, E[ X7] = E[ X2] = E[W?]. 
If E[ X^] « o, then E[W^] = E[ X4] < ELX*^] = E[W.] Gensen's inequality 
again), and so + is a member of Z. Hence E[r] < AE[W]. 

This construction establishes the first of Skorohod’s embedding theorems: 


Theorem 37.6. Suppose that X is a random variable with mean 0 and finite 
variance. There is a stopping time t such that W, has the same distribution as 
X, E[7] = ELX?], and E[7?] < 4ELX*]. 


Of course, the last inequality is trivial unless E[.X ^] is finite. The theorem 
could be stated in terms not of X but of its distribution, the point being that 
the probability space X is defined on is irrelevant. Skorohod's second 
embedding theorem is this: 


Theorem 37.7. Suppose that X,, X,,... are independent and identically 
distributed random variables with mean 0 and finite variance, and put S, — X, 
+--+ +X,. There is a nondecreasing sequence 7,,75,... of stopping times 
such that the W, have the same joint distributions as the S, and v,,75 —7,,74 
— 75,... are independent and identically distributed random variables satisfying 
Elz, — 7, .] PIX 1 and El(z, s ] 5 ZD). 


PRoor. The method is to repeat the construction above inductively. For 
notational clarity write W, — W,® and put %Z% =o[W®: 0 <s <t] and 
F =o[W,: í > 0]. Let ó, be the stopping time of Theorem 37.6, so that 
W) and X, have the same distribution. Let S, be the class of M such 
that MN[6, xt] e F for all t. 

Now put wo — Win, ES WD, # = o[W.: O<s<t], and FË = 
o[W,.: t > 0]. By another application of Theorem 37.6, construct a stopping 
time ô, for the Brownian motion [W,”: t > 0] in such a way that WS has the 
same distribution as X y In fact, use for ó, the very same martingale 
construction as for ó,, so that (6,,WS”) and (8,,WS) have the same 
distribution. Since Suo and # are independent (see (37.28)), it follows 
(see (37.29)) that (6,, Ws”) and (5,, WO") are independent. 


t S ^ ° ^ 
This is obvious from the weak-convergence point of view. 


S20 STOCHASTIC PROCESSES 


Let F? be the class of M such that M n [8, <t] e F,” for all t. If 
wo = we. - Wi? and F® is the o-field generated by these random 
variables, then again S O and F® are independent. These two o-fields 
are contained in F, which is independent of Z“. Therefore, the three 
o-fields 940, FO, gr @ are independent. The procedure therefore extends 
itidnctively. to give wes Suk. identically distributed random vectors 
(Sn, Wf), If c, =6, + +++ +6,, then W) = WL)  -:: +W has the dis- 
tribution of X4 3X. ; a 


n 


Invariance* 


If E[ X7] = &?, then, since the random variables 7, — 7,.., of Theorem 37.7 
are independent and identically distributed, the strong law of large numbers 
(Theorem 22.1) applies and hence so does the weak one: 


(37.40) P|[In^!'c, — o2| > e] ^ 0. 
(If E[X/] <œ, so that the 7, — 7, , have second moments, this follows 
immediately by Chebyshev's inequality.) Now S, has the distribution of 
W(c,) and 7, is near no? by (37.40); hence S, should have nearly the 
distribution of W(nc?), namely the normal distribution with mean 0 and 
variance na? 

To prove Tus choose an increasing sequence of integers N, such that 
Pin w o zk <k- ' for n > N,, and put e, = k^! for N, <n «NO 
Then e, > 0 and P[|n !, — o?| > e,] < e,. By two application’! of (37.32), 

W(noa?) — W(r 
diee all; 
ayn 
<P\n'r,—-07|>€,|+P! sup |W(t)—W(no2)|>eovm 


It—no?| x e,n 


<e * AP||[W(e,n)| > eovn |, 


and it follows by Chebyshev's inequality that lim, à,(c) = 0. Since $, is 
distributed as W(r,), 


p| Hae) ) ste- TOE- T sx 


Wene) ae 
Hinc) oe 
oyn ai 


<P 


*This topic may be omitted. 


SECTION 37. BROWNIAN MOTION 521 


Here W(no*)/ovn can be replaced by a random variable N with the 
standard normal distribution, and letting n — © and then e — 0 shows that 


im p| E zi = P[N xx]. 


This gives a new proof of the central limit theorem for independent, identi- 
cally distributed random variables with second moments (the Lindeberg- Lévy 
theorem— Theorem 27.1). Observe that none of the convergence theory of 
Chapter 5 has been used. 

This proof of the central limit theorem is an application of the invariance 
principle: S, has nearly the distribution of W(na?), and the distribution of 
the latter does not depend on (vary with) the distribution common to the X,. 
More can be said if the X, have fourth moments. 

For each n, define a stochastic process [Y,(t): 0 <t € 1] by Y,(0, w) = 0 
and 


: jt nit La k Sane 
n 


(37.41) Y,(t,@) = Bape if < 


If k/n = t > 0 and n is large, then k is large, too, and Y,(t)=1t'/?S,/ovk is 
by the central limit theorem approximately normally distributed with mean 0 
and variance t. Since the X, are independent, the increments of (37.41) 
should be approximately independent, and so the process should behave 
approximately as a Brownian motion does. 

Let 7, be the stopping times of Theorem 37.7, and in analogy with (37.41) 
put Z,(0) = 0 and 

1 A AI k 

(37.42) Z^ (1) = mi ADU if EY S ps Pu 


By construction, the finite-dimensional distributions of [Y, (1): 0 < t < 1] coin- 
cide with those of [Z (t): 0 x t < 1]. It will be shown that the latter process 
nearly coincides with [W(na?)/avn : 0 <t < 1], which is itself a Brownian 
motion over the time interval [0, 1]—see (37.11). Put W,(t) = W(tno?)/ovn . 

Let B,(8) be the event that |z, — ko?| z ông?’ for some k «n. By 
Kolmogorov's inequality (22.9), 


Vern] | 41s 


(37.43) P(B (0) s pea E 


If (k—Dn^! «t kn"! and n» ë`” |, then 


1 


Tk k > 3 
+ a x28 


na? n 


522 STOCHASTIC PROCESSES 


on the event (B,(5))°, and so 


Iz) - wol ew (s) mos sup IM) - Wat) 


js —t|<26 


on (B,(8))°. Since the distribution of this last random variable is unchanged 
if the W(t) are replaced by W(t), 


P 


w|Z, (o) - wp =e] 


< P(B,(5))+P|sup sup Imc) - were]. 


t<1 |s—1|x 28 


Let n — o and then 80; it follows by (37.43) and the continuity of 
Brownian motion paths that 


(37.44) lim P| sup| Z, (1?) — W.(t)| > e| =0 


t<1 


for positive e. Since the processes (37.41) and (37.42) have the same finite- 
dimensional distributions, this proves the following general invariance princi- 
ple or functional central limit theorem. 


Theorem 37.8. Suppose that X,, X,,... are independent, identically dis- 
tributed random variables with mean 0, variance o°, and finite fourth mo- 
ments, and define Y (t) by (37.41). There exist (on another probability space), 
for each n, processes [Z, (t): 0 < t < 1] and [W,(t): 0 < t < 1] such that the first 
has the same finite-dimensional distributions as [Y,(t): 0 < t < 1], the second is 
a Brownian motion, and P[sup, <,|Z,(t) - W,(t)| > e] > 0 for positive e. 


As an application, consider the maximum M, = max, <n $,. Now 
M, /avn = sup, Y,(t) has the same distribution as sup, Z,(t), and it follows 
by (37.44) that 

i 


But P[sup, .,W,(t) > x] = P[sup, <, W() >x] =2P[N >x] for x20 by 
(37.32). Therefore, 


supZ,(t) — sup A0] > e] — 0. 


(<l (<1 


37.45) P Hi £x| ^2PINExÀh — x20. 
( pua 


SECTION 37. BROWNIAN MOTION 523 


PROBLEMS 


37.1. 


tod 
-J 
LJ 

| 


37.4. 


37,5, 


37.6. 


37.7 


36.217 Show that K(s,t) = min(s, t} is nonnegative-definite; use Problem 36.2 
to prove the existence of a process with the finite-dimensional distributions 
prescribed for Brownian motion. 


. Let X(t) be independent, standard normal variables, one for each dyadic 


rational ¢ (Theorem 20.4; the unit interval can be used as the probability 
space). Let W(0) 2 0 and W(n)= LZ_,X(k). Suppose that W(t) is already 
defined for dyadic rationals of rank n, and put 


2k + 1 i k k+1 1 Zk + 1 
"| "LIS J (s) eim 2n |t qoae]. 


Prove by induction that the W(t) for dyadic t have the finite-dimensional 
distributions prescribed for Brownian motion. Now construct a Brownian 
motion with continuous paths by the argument leading to Theorem 37.1. This 
avoids an appeal to Kolmogorov's existence theorem. 


. 1. For each n define new variables W,(t) by setting W,(k/2") = W(k/2^) for 


dyadics of order n and interpolating linearly in between. Set 6, = 
sup, «IW, , (t) — W,(t)|, and show that 


2k+1 k k+1 
(a) - Bm) m) 


The construction in the preceding problem makes it clear that the difference 
here is normal with variance 1/2"*?. Find positive x, such that Xx, and 
X P[ó, > x,] both converge, and conclude that outside a set of probability 0, 
W,(t,@) converges uniformly over bounded intervals. Replace W(t,w) by 
lim, W,(t,@). This gives another construction of a Brownian motion with 
continuous paths. 


6, = max 
O<k<n2" 


36.61 Let T=[0,~), and let P be a probability measure on (RT, 427) having 
the finite-dimensional distributions prescribed for Brownian motion. Let C 
consist of the continuous elements of RT. 

(a) Show that P,(C)=0, or P*(RT—C)-1 (see (3.9) and (3.10). Thus 
completing (RT, A’, P) will not give C probability 1. 

(b) Show that P*(C)» 1. 


Suppose that [W,: t > 0] is some stochastic process having independent, sta- 
tionary increments satisfying E[W,]- 0 and E[W,?]=¢. Show that if the 
finite-dimensional distributions are preserved by the transformation (37.11), 
then they must be those of Brownian motion. 


Show that (, . 90[W,: s > t] contains only sets of probability 0 and 1. Do the 
same for (|, .90[W,: 0 < t < e]; give examples of sets in this o-field. 


Show by a direct argument that W(-, œw) is with probability 1 of unbounded 
variation on 1 di Y, = X2 |IW(i27") - W((i — 1)27"). Show that Y. 
has mean 2”/*E[|W,|] and variance at most Var[|W/|]. Conclud t 
LPLY,, < n] < o. : = 


524 


37.8. 


37.9. 


37.10. 


37.11. 


37.12. 


37.13. 


37.14. 


37.15. 


37.16. 


STOCHASTIC PROCESSES 


Show that the Poisson process as defined by (23.5) is measurable. 


Show that for T = [0, œ) the coordinate-variable process [Z,: t € T] on (RT, 27) 
is not measurable. 


Extend Theorem 37.4 to the set [t: W(t, w) =a]. 


Let 7, be the first time the Brownian motion hits a > 0: 7, = inf[t: W, > a]. 
Show that the distribution of 7, has over (0,o) the density 


1: oe 
(37.46) h(t) = a BA a? 2 


Show that E[7,] = œ. Show that T, has the same distribution as a? /N2, where 
N is a standard normal variable. 


T (a) Show by the strong Markov property that +, and Tap Tuy SEE 
independent and that the latter has the same distribution as Tg. Conclude that 
ha™* hg = hap: Show that Br, has the same distribution as Ta fg" 


(b) Show that each h, is stable—see Problem 28.10. 


T Suppose that X,, X,,... are independent and each has the distribution 
(37.46). 

(a) Show that (X, + --- +X,,)/n? also has the distribution (37.46). Contrast 
this with the law of large numbers. 


(b) Show that P[n ^max, ., X, <x] exp(—ay2/rx) for x>0. Relate 
this to Theorem 14.3. 


37.117 Let p(s,t) be the probability that a Brownian path has at least one 
zero in (s, t). From (37.46) and the Markov property deduce 


2 152 
(37.47) p(s,t) = = arccos V; 


Hint: Condition with respect to W,. 


t (a) Show that the probability of no zero in (1,1) is (2/7)arcsin vt. and 
hence that the position of the last zero preceding 1 is distributed over (0, 1) 
with density = !(((1— t) 1⁄2. 

(b) Similarly calculate the distribution of the position of the first zero follow- 
ing time 1. 

(c) Calculate the joint distribution of the two zeros in (a) and (b). 


T (a) Show by Theorem 37.8 that inf, < „<, Y,(u) and inf, < „<, Z,(u) both 
converge in distribution to inf, .,, ., W(u) for 0 <s «t < 1. Prove a similar 
result for the supremum. 

(b) Let A,(s, 1) be the event that S,, the position at time k in a symmetric 
random walk, is 0 for at least one k in the range sn < k < tn, and show that 


P(A,(s,t)) > (2 /m)arccos s/t . 


SECTION 37. BROWNIAN MOTION 525 


37.17. 


37.18. 


37.19. 


37.20. 


(c) Let T, be the maximum k such that k <n and S, = 0. Show that T,,/n has 
asymptotically the distribution with density m~'(t(1 —1))~'/2 over (0,1). As 
this density is larger at the ends of the interval than in the middle, the last time 
during a night’s play a gambler was even is more likely to be either early or late 
than to be around midnight. 


t Show that p(s,t)=p(t~',s~') = ples, ct). Check this by (37.47) and also 
by the fact that the transformations (37.11) and (37.12) preserve the properties 
of Brownian motion. 

Deduce by the reflection principle that (M,, W,) has density 


2(2y —x) exp 
ty277t 


4 enr 
2t 


on the set where y 2 0 and y 2x. Now deduce from Theorem 37.8 the 
corresponding limit theorem for symmetric random walk. 


Show by means of the transformation (37.12) that for positive a and b the 
probability is 1 that the process is within the boundary —at « W, « bt for all 
sufficiently large t. Show that a/(a + b) is the probability that it last touches 
above rather than below. 


The martingale calculation used for (37.39) also works for slanting boundaries. 
For positive a, b,r, let 7 be the smallest ¢ such that either W, = —a + rt or 
W, =b ^ rt, and let p(a, b, r) be the probability that the exit is through the 
upper barrier—that W, = b + rr. 


(a) For the martingale Y, , in the proof of Lemma 2, show that E[Y, ,] = 1. 
Operating formally at first, conclude that 


(37.48) Bless] = X 


Take 6=2r, and note that 0W. — +4077 is then 2rb if the exit is above 
(probability p(a, b, r)) and —2ra if the exit is below (probability 1 — p(a, b, r)). 
Deduce 


Ju 2ra 
p(a,b,r)= pU 


(b) Show that p(a,b,r)— a/(a + b) as r — 0, in agreement with (37.39). 
(c) It remains to justify (37.48) for 0 = 2r. From E[Y, ,] = 1 deduce 


(37.49) E[e?^-7*e)] =1 


for nonrandom c. By the arguments in the proofs of Lemmas 2 and 3, show 
that (37.49) holds for simple stopping times c, for bounded ones, for ¢ = + A n, 
for c = 7. 


526 STOCHASTIC PROCESSES 
SECTION 38. NONDENUMERABLE PROBABILITIES* 


Introduction 


As observed a number of times above, the finite-dimensional distributions do 
not suffice to determine the character of the sample paths of a process. To 
obtain paths with natural regularity properties, the Poisson and Brownian 
motion processes were constructed by ad hoc methods. It is always possible 
to ensure that the paths have a certain very general regularity property called 
separability, and from this property will follow in appropriate circumstances 
various other desirable regularity properties. 

Section 4 dealt with *denumerable" probabilities; questions about path 
functions involve all the time points and hence concern "*nondenumerable" 
probabilities. 


Example 38.1. For a mathematically simple illustration of the fact that 
path properties are not entirely determined by the finite-dimensional distri- 
butions, consider a probability space (Q, Z, P) on which is defined a positive 
random variable V with continuous distribution: P[V = x] = 0 for each x. For 
t > 0, put X(t, w) = 0 for all w, and put 


1 if V(w) =f, 


ene Ae Le eto (AGO enc 


Since V has continuous distribution, P[ X, = Y,] = 1 for each f, and so [X;: 
t > 0] and [Y,: t > 0] are stochastic processes with identical finite-dimensional 
distributions; for each 41,,...,ft,, the distribution u, , common to 
(X, ,..., X, ) and (Y,,...,Y, ) concentrates all its mass at the origin of R“. 
But what about the sample paths? Of course, X(-,w) is identically 0, but 
Y(-,w) has a discontinuity—it is 1 at t = V(w) and 0 elsewhere. It is because 
the position of this discontinuity has a continuous distribution that the two 


processes have the same finite-dimensional distributions. a 


Definitions 


The idea of separability is to make a countable set of time points serve to 
determine the properties of the process. In all that follows, the time set T 
will for definiteness be taken as [0,%). Most of the results hold with an 
arbitrary subset of the line in the role of T. 

As in Section 36, let R^ be the set of all real functions over T = [0, o»). Let 
D be a countable, dense subset of T. A function x—an element of RT— is 
separable D, or separable with respect to D, if for each t in T there exists a 


*This section may be omitted. 


SECTION 38. NONDENUMBERABLE PROBABILITIES 527 


sequence f¢,,t,,... of points such that 
(38.2) t, € D, ty ty x(t,)  x(t). 


(Because of the middle condition here, it was redundant to require D dense 
at the outset.) For t in D, (38.2) imposes no condition on x, since t, may be 
taken as t. An x separable with respect to D is determined by its values at 
the points of D. Note, however, that separability requires that (38.2) hold for 
every té—an uncountable set of conditions. It is not hard to show that the set 
of functions separable with respect to D lies outside A’. 


Example 38.2. If x is everywhere continuous or right-continuous, then it 
is separable with respect to every countable, dense D. 

Suppose that x(t) is 0 for t # v and 1 for t = v, where v > 0. Then x is not 
separable with respect to D unless v lies in D. The paths Y(-, w) in Example 
38.1 are of this form. m 


The condition for separability can be stated another way: x is separable D 
if and only if for every t and every open interval / containing t, x(t) lies in 
the closure of [x(s): se In D]. 

Suppose that x is separable D and that I is an open interval in T. If 
e>0, then x(to) +e >sup,e / x(t)=u for some tọ in I. By separability 
ix(sg) — x(19)| < e for some s, in ZN D. But then x(s)) + 2e > u, so that 


(38.3) supx(t)= sup x(t). 
tel telnD 
Similarly, 
J infx(t) = inf t 

(3E ° infa ) ant a ) 
and 
(38.5) sup |x(t)—x(to)|= sup |x(t)—x(t9)]. 

fg €t «tg Ó fy €t <t, +ó 

teD 


A stochastic process [X,: t > 0] on (Q, Z, P) is separable D if D is a 
countable, dense subset of T =[0,0) and there is an set N such that 
P(N) = 0 and such that the sample path X(-, w) is separable with respect to 
D for w outside N. Finally, the process is separable if it is separable with 
respect to some D; this D is sometimes called a separant. In these definitions 
it is assumed for the moment that X(t, w) is a finite real number for each t 
and o. 


Example 38.3. If the sample path X(:,@) is continuous for each w, then 
the process is separable with respect to each countable, dense D. This covers 
Brownian motion as constructed in the preceding section. = 


528 STOCHASTIC PROCESSES 


Example 38.4. Suppose that [W,: t > 0] has the finite-dimensional distri- 
butions of Brownian motion, but do not assume as in the preceding section 
that the paths are necessarily continuous. Assume, however, that [W,: t > 0] 
is separable with respect to D. Fix tọ and ô. Choose sets Dm = {t mis- - -> tmm 
of D-points such that ty < tmy < °'* <tmm<to +6 and D, 1 DO (to, to + 6). 
By the argument leading to (37.9), 


(38.6) P 


sup IW, = Wi lea < 
to <t Xtg 6 
teD 


a’ 


For sample points outside the N in the definition of separability, the 
supremum in (38.6) is unaltered, because of (38.5), if the restriction t € D is 
dropped. Since P(N) = 0, 


2 
P 


sup |W,- W, | ^a 


< 


a’ 


Define M, by (37.8) but with r ranging over all the reals (not just over the 
dyadic rationals) in [k2~",(k + 2)2~"]. Then P[M, >n 1] = AKN 72" for 
lows just as before. But for w outside B=[M, >n ! io. W(,o) is 
continuous. Since P(B) = 0, W(-, w) is continuous for o outside an ¥set of 
probability 0. If (Q, F, P) is complete, then the set of w for which W(-, w) is 
continuous is an “set of probability 1. Thus paths are continuous with 
probability 1 for any separable process having the finite-dimensional distribu- 
tions of Brownian motion— provided that the underlying space is complete, 
which can of course always be arranged. a 


As it will be shown below that there exists a separable process with any 
consistently prescribed set of finite-dimensional distributions, Example 38.4 
provides another approach to the construction of continuous Brownian 
motion. The value of the method lies in its generality. It must not, however, 
be imagined that separability automatically ensures smooth sample paths: 


Example 38.5. Suppose that the random variables X,, t > 0, are indepen- 
dent, each having the standard normal distribution. Let D be any countable 
set dense in T =[0,). Suppose that / and J are open intervals with rational 
endpoints. Since the random variables X, with t€ DI are independent, 
and since the value common to the P[X,€J] is positive, the second 
Borel-Cantelli lemma implies that with probability 1, X, € J for some t in 
D nl. Since there are only countably many pairs / and J with rational 
endpoints, there is an set N such that P(N) — 0 and such that for w 
outside N the set [X(t,w): t E DAI] is everywhere dense on the line for 
every open interval / in T. This implies that [ X,: t > 0] is separable with 
respect to D. But also of course it implies that the paths are highly irregular. 


SECTION 38. NONDENUMBERABLE PROBABILITIES 529 


This irregularity is not a shortcoming of the concept of separability—it is a 
necessary consequence of the properties of the finite-dimensional distribu- 
tions specified in this example. | 


Example 38.6. The process [Y,: t > 0] in Example 38.1 is not separable: 
The path Y(-,@) is not separable D unless D contains the point V(w). The 
set of w for which Y(-, o) is separable D is thus contained in [w: V(w) € D], 
a set of probability 0, since D is countable and V has a continuous 
distribution. *- 


Existence Theorems 


it will be proved in stages that for every consistent system of finite-dimen- 
sional distributions there exists a separable process having those distribu- 
tions. Define x to be separable D at the point t if there exist points t, in D 
such that t, — t and x(t,) — x(t). Note that this is no restriction on x if t 
lies in D, and note that separability is the same thing as separability at 
every f. 


Lemma 1. Let [X,: t> 0] be a stochastic process on (Q, Z, P). There 
exists a countable, dense set D in [0,œ), and there exists for each t an set 
N(t), such that P( N(t)) = 0 and such that for o outside N(t) the path function 
X(-,@) is separable D at t. 


Proor. Fix open intervals I and J, and consider the probability 


pu) - P| N Lx, e7]) 


seU 


for countable subsets U of IAT. As U increases, the intersection here 
decreases and so does p(U). Choose U, so that p(U,)- infy p(U). If 


UCI, J) 7 U ,U,, then UU, J) is a countable subset of I T making p(U) 
minimal: 


(38.7) P| N [x,eJ)) < p [ 


se€U(I,J) 


Ux e71) 


seu 


for every countable subset U of INT. If (e Ir T, then 


(38.8) p[[x,eJ]n n [x, €/]} =0, 
seUl, J) 


because otherwise (38.7) would fail for U = U(1, J) U (t). 


530 STOCHASTIC PROCESSES 


Let D = U U(1, J), where the union extends over all open intervals / and 


J with rational endpoints. Then D is a countable, dense subset of T. For 
each ż let 


(38.9) N= U [[xeJ]n A Ix,eJ)), 


seU(1,J) 


where the union extends over all open intervals J that have rational end- 
points and over all open intervals J that have rational endpoints and contain 
t. Then N(t) is by (38.8) an set such that P(N(t)) = 0. 

Fix ¢ and w € M(t). The problem is to show that X(-, w) is separable with 
respect to D at t. Given n, choose open intervals I and J that have rational 
endpoints and lengths less than n `! and satisfy t € I and X(t, w) € J. Since 
w lies outside (38.9), there must be an s, in U(I, J) such that X(s,, o) € J. 
But then s, € D, |s, — t| «n^ !, and |X(s,, o) — X(t, o) <n 1. Thus s, >t 
and X(s,, w) > X(t, w) for a sequence s,,5,,... in D. = 


For any countable D, the set of w for which X(-,w) is separable with 
respect to D at t is 


(38.10) (1 U loe:X(65o) - X(s,o)| «n-!]. 
n-—l|s—r|«n^! 
seD 


This set lies in Z for each t, and the point of the lemma is that it is possible 
to choose D in such a way that each of these sets has probability 1. 


Lemma 2. Let [ X,: t > 0] be a stochastic process on (Q, F, P). Suppose 
that for all t and o 


(38.11) a<X(t,w) «b. 


Then there exists on (Q, F,P) a process [X;: t> 0] having these three 
properties: 


(i) PIX; = X,]= 1 for each t. 
(ii) For some countable, dense subset D of [0,99), X'(-,@) is separable D 
for every w in Q. 


(iii) For all t and w, 


(38.12) a <X'(t,w) <b. 


PRoor. Choose a countable, dense set D and sets N(t) of probability 
0 as in Lemma 1. If t € D or if w € N(t), define X'(t, w) = X(t, w). If t & D, 


SECTION 38. NONDENUMBERABLE PROBABILITIES 531 


fix some sequence (s()) in D for which lim, s) = t, and define X'(t, w) = 
lim sup, X (s, w) for w € N(t). To sum up, 


X(t,o) if t € D or «o € N(t), 
(38.13) X'(5,e) =) jimsupX(s,@) ift € D and o e N(t). 


Since N(t) € F, X; is measurable F for each t. Since P(N(t)) = 0, P[ X, = 
X;] 7» 1 for each t. 

Fix t and w. If t € D, then certainly X'(:, c) is separable D at t, and so 
assume t € D. If w € N(t), then by the construction of N(t), X(-,w) is 
separable with respect to D at t, so that there exist points s, in D such that 
s, >t and X(s,, v) > X(t, w). But X(s,, w) = X'(s,, œ) because s, € D, and 
X(t,@)=X'(t,w) because w € N(t). Hence X'(s,,w)  X'(t,w), and so 
X'(*, o) is separable with respect to D at t. Finally, suppose that t € D and 
w € N(t). Then X'(t, w) = lim, AC w) for some sequence {n,} of integers. 
As ko, s >t and X'(s(), o)= X(sp), e) ^ X(t, @), so that again 
X'(*, c) is separable with respect to D at t. Clearly, (38.11) implies (38.12). 

E 


Example 38.7. One must allow for the possibility of equality in (38.12). 
Suppose that V(w) » 0 for all o and that V has a continuous distribution. 
Define 


— deca ge 
r= (6 feq 


and put X(t, w) = f(t — V(o)). If [X;: t > 0] is any separable process with the 
same finite-dimensional distributions as [X,: t > 0], then X’(-, œ) must with 
probability 1 assume the value 1 somewhere. In this case (38.11) holds for 


a < 0 and b = 1, and equality in (38.12) cannot be avoided. a 
If 
(38.14) sup | X(t, @)| < co; 
t, o 


then (38.11) holds for some a and b. To treat the case in which (38.14) fails, 
it is necessary to allow for the possibility of infinite values. If x(t) is © or 
—oo, replace the third condition in (38.2) by x(t,) > v or x(t,) ^ — e. This 
extends the definition of separability to functions x that may assume infinite 
values and to processes [ X,: t > 0] for which X(t, w) = + e is a possibility. 


Theorem 38.1. Jf [X,: t > 0] is a finite-valued process on (Q, F, P), there 
exists on the same space a separable process [ X;: t > 0] such that PL X; = X 21 
for each t. 


532 STOCHASTIC PROCESSES 


It is assumed for convenience here that X(t, o) is finite for all t and o, 
although this is not really necessary. But in some cases infinite values for 
certain X’(t,w) cannot be avoided—see Example 38.8. 


Proor. If (38.14) holds, the result is an immediate consequence of 
Lemma 2. The definition of separability allows an exceptional set N of 
probability 0; in the construction of Lemma 2 this set is actually empty, but it 
is clear from the definition this could be arranged anyway. 

The case in which (38.14) may fail could be treated by tracing through the 
preceding proofs, making slight changes to allow for infinite values. A simple 
argument makes this unnecessary. Let g be a continuous, strictly increasing 
mapping of R' onto (0,1). Let Y(t, w) = g( X(t, w)). Lemma 2 applies to [Y,: 
t > 0]; there exists a separable process [Y/: t > 0] such that P[Y/ = Y,] = 1. 
Since 0 < Y(t, w) < 1, Lemma 2 ensures 0 < Y'(t, w) < 1. Define 


— oo if Y'(t,v) = 0, 
X'(t,o) - Cg (Y'(t,0)) if0<Y'(t,) «1, 
+o EY 0) T 


Then [X;: t 2 0] satisfies the requirements. Note that P[ X! = +œ] = 0 for 
each t. a 


Example 38.8. Suppose that V(w)> 0 for all o and V has a continuous 
distribution. Define 


= 6 
h(t) = |t] if £ #0, 
0 if t = 0, 


and put X(t,w)=h(t—V(w)). This is analogous to Example 38.7. If et 
t > 0] is separable and has the finite-dimensional distributions of [ X Hem e OR 
then X'(-, œw) must with probability 1 assume the value œ for some t. a 


Combining Theorem 38.1 with Kolmogorov’s existence theorem shows that 
for any consistent system of finite-dimensional distributions u t,...t, there exists a 
separable process with the a, ,, as finite-dimensional distributions. As shown 


in Example 38.4, this leads to another construction of Brownian motion with 
continuous paths. 


Consequences of Separability 


The next theorem implies in effect that, if the finite-dimensional distributions 
of a process are such that it *should" have continuous paths, then it will in 
fact have continuous paths if it is separable. Example 38.4 illustrates this. 
The same thing holds for properties other than continuity. 


SECTION 38. NONDENUMBERABLE PROBABILITIES 533 


Let R! be the set of functions on T = [0,) with values that are ordinary 
reals or else o or —o. Thus R! is an enlargement of the RT of Section 36, an 
enlargement necessary because separability sometimes forces infinite values. 
Define the function Z, on R” by Z (x)= Z(t, x) 2 x(t). This is just an 
extension of the coordinate function (36.8). Let ZT be the o-field in RTI 
generated by the Z,, t 2 0. A 

Suppose that A is a subset of RT, not necessarily in A’. For D c T = [0, o»), 
let Ap consist of those elements x of RT that agree on D with some 
element y of A: 


(38.15) Ap= U N [ze RT: x(t) =y(t)|. 


yeA t€ D 


Of course, A C Ap. Let Sp denote the set of x in RT that are separable with 
respect to D. 

In the following theorem, [.X,: t 2 0] and [X/: t > 0] are processes on 
spaces (Q, F, P) and (Q', F', P"), which may be distinct; the path functions 
are X(-,w) and X’(-, w’). 


Theorem 38.2. Suppose of A that for each countable, dense subset D of 
T =[0,%), the set (38.15) satisfies 


(38.16) A, eZ", © WAL, WHE 


If [X,: t > 0] and [X]: t > 0] have the same finite-dimensional distributions, if 
[w: X(-,w) <A] lies in Z and has P-measure 1, and if [X;: t=0] is 
separable, then [w': X'(-, w) € A] contains an F '-set of P'-measure 1. 


If (Q', F', P") is complete, then of course [w’: X'(-, a) €A] is itself an 
F'-set of P'-measure 1. 


Proor. Suppose that [X;: t> 0] is separable with respect to D. The 
difference [w’: X'(-, w’) €A,]—[w': X'(-,@') € A] is by (38.16) a subset of 
[w’: X'(-, w) € R! — Sp], which is contained in an ¥'-set of N' of P'-mea- 
sure 0. Since the two processes have the same finite-dimensional distributions 
and hence induce the same distribution on (RT, 7), and since Ap lies in 
PT, it follows that P[w’: X'C,o)eAp]-P[o: X(-,w)€Ap)= Plo: 
X(:,w)€A]=1. Thus the subset [w’: X'(:,w) € Ap] — N' of lw: X'C, a’) 
€ A] lies in Z’ and has P'-measure 1. a 


Example 38.9. Consider the set C of finite-valued, continuous functions 


on T. If x € $, and y € C, and if x and y agree on a dense D, then x and y 
agree everywhere: x = y. Therefore, Cp N Sp C C. Further, 


Cy = N U N [x ER": |x(s)| <œ, |x(t)| <œ, |x(s) -x(1)| « e], 


534 STOCHASTIC PROCESSES 


where e and ô range over the positive rationals, t ranges over D, and the 
inner intersection extends over the s in D satisfying |s —t| « 0. Hence 
Cp € 47. Thus C satisfies the condition (38.16). 

Theorem 38.2 now implies that if a process has continuous paths with 
probability 1, then any separable process having the same finite-dimensional 
distributions has continuous paths outside a set of probability 0. In particular, 
a Brownian motion with continuous paths was constructed in the preceding 
section, and so any separable process with the finite-dimensional distribu- 
tions of Brownian motion has continuous paths outside a set of probability 0. 
The argument in Example 38.4 now becomes supererogatory. a 


Example 38.10. There is a somewhat similar argument for the step functions of 
the Poisson process. Let Z* be the set of nonnegative integers; let E consist of the 
nondecreasing functions x in R! such that x(t) € Z* for all t and such that for every 
n € Z+ there exists a nonempty interval / such that x(t)=n for t € I. Then 


Ep- fY|x:x(t)ez^]ov- (ye Pxssetsyrsevenn 


teD s,tED,s<t 


ON U A exon]. 


n=0 I teDnl 


where I ranges over the open intervals with rational endpoints. Thus Ep € T. 
Clearly, Ep N $5 CE, and so Theorem 38.2 applies. 

In Section 23 was constructed a Poisson process with paths in E, and therefore any 
separable process with the same finite-dimensional distributions will have paths in E 
except for a set of probability 0. = 


Example 38.11. For E as in Example 38.10, let E, consist of the elements of E 
that are right-continuous; a function in E need not lie in Ep, although at each t it 
must be continuous from one side or the other. The Poisson process as defined in 
Section 23 by N, = max[n: S, < t] (see (23.5)) has paths in Ey. But if N/ = max[n: 
S, < t], then [N/: t > 0] is separable and has the same finite-dimensional distributions, 
but its paths are not in Ey. Thus E, does not satisfy the hypotheses of Theorem 38.2. 


Separability does not help distinguish between continuity from the right and continu- 
ity from the left. a 


_ Example 38.12. The class of sets A satisfying (38.16) is closed under the forma- 
tion of countable unions and intersections but is not closed under complementation. 
Define X, and Y, as in Example 38.1, and let C be the set of continuous paths. Then 
[Y,: t 2 0] and [X,: t > 0] have the same finite-dimensional distributions, and the 


latter is separable; Y(-,w) is in RT — C for each w, and X(-,w) is in RT — C for 
no o. Lj 


Example 38.13. As a final example, consider the set J of functions with disconti- 
nuities of at most the first kind: x is in J if it is finite-valued, if x(t + ) = lim, , , x(s) 
exists (finite) for t > 0 and x(t —) = lim, , , s(5) exists (finite) for t > 0, and if xt) lies 


between x(t + ) and x(t — ) for t > 0. Continuous and right-continuous functions are 
special cases. 


SECTION 38. NONDENUMBERABLE PROBABILITIES 535 


Let V denote the general system 
(38.17) Viki a y Pei Sayer 91501 pi) 
where k is an integer, where the r;, s;, and o; are rational, and where 
Ü =p, <5; <; X55 < 83° «ns 
Define 


k 
J(D,V,e) = f) [x: a; <x(t) <a,+e, t€(7,5;) ^ D] 


i-1 


k 
n N [x: min{a;_,,@;} <x(t) < max{a;_,,a;} t e, te (s, n) n D]. 
i-2 


Let Z, , & be the class of systems (38.17) that have a fixed value for k and satisfy 
r,—S;-4 <6, ¿= 2,..., k, and s, > m. It will be shown that 


(38.18) fm mom Vee 


where e and 6 range over the positive rationals. From this it will follow that Jp € PT. 
It will also be shown that Jp N Sp CJ, so that J satisfies the hypothesis of Theorem 
38.2. 

Suppose that y € J. For fixed e, let H be the set of nonnegative h for which there 
exist finitely many points t; such that 0 = t, € t; < *:* <t,=h and |y(t) - y(') < € 
for t and ft’ in the same interval (t;_;,t;). If h, € H and h, 1 h, then from the 
existence of y(h — ) follows h € H. Hence H is closed. If h € H, from the existence of 
y(h + ) it follows that H contains points to the right of h. Therefore, H = [0, oo). From 
this it follows that the right side of (38.18) contains Jp. 

Suppose that x is a member of the right side of (38.18). It is not hard to deduce 


that for each t the limits 


(38.19) lim x(s), lim x(s) 


sit, seD sTt,seD 


exist and that x(t) lies between them if t € D. For t € D take y(t)= x(1), and for 
t € D take y(t) to be the first limit in (38.19). Then y € J and hence x € Jy. This 
argument also shows that Jp N Sp CJ. LI 


Appendix 


Gathered here for easy reference are certain definitions and results from set theory 
and real analysis required in the text. Although there are many newer books, 
HAUuSpOoRFF (the early sections) on set theory and Harpy on analysis are still 
excellent for the general background assumed here. 


Set Theory 


Al. The empty set is denoted by @. Sets are variable subsets of some space that is 
fixed in any one definition, argument, or discussion; this space is denoted either 
generically by Q or by some special symbol (such as R^ for Euclidean k-space). A 
singleton is a set consisting of just one point or element. That A is a subset of B is 
expressed by A CB. In accordance with standard usage, A CB does not preclude 
A = B; A isa proper subset of B if ACB and A # B. 

The complement of A is always relative to the overall space Q; it consists of the 
points of Q not contained in A and is denoted by A‘. The difference between A and 
B, denoted by A — B, is A B^; here B need not be contained in A, and if it is, then 
A —B is a proper difference. The symmetric difference AaB = (A BS) U CA* n B) 
consists of the points that lie in one of the sets A and B but not in both. 

Classes of sets are denoted by script letters. The power set of Q is the class of all 
subsets of Q; it is denoted 2”. 


A2. The set of w that lie in A and satisfy a given property p(w) is denoted [w € A: 
p(w); if A =, this is usually shortened to [w: p(w)). 


A3. In this book, to say that a collection [A,: 0 € @] is disjoint always means that it 
is pairwise disjoint: Ag NAg = 0 if 0 and 6' are distinct elements of the index set @. 
To say that A meets B, or that B meets A, is to say that they are not disjoint: 
ANB * (2. The collection [4,: 0 € O] covers B if BCU,A,. The collection is a 
decomposition or partition of B if it is disjoint and B = UPT: P 


A4. By A,T1 A is meant A,;CA,C-:- and A=U,A,; by A IA is meant 
A,245,2-::: and 4 EAA 


A5. The indicator, or indicator function, of a set A is the function on Q that 


assumes the value 1 on A and O on A‘; it is denoted I, The alternative term 
“characteristic function” is reserved for the Fourier transform (see Section 26). 


536 


APPENDIX 537 


A6. De Morgan’s laws are (U 44) = (1,45 and (0,44) = U o AG. These and the 
other facts of basic set theory are assumed known: a countable union of countable 
sets Is countable, and so on. 


A7. If T: (Q XY is a mapping of Q into ()' and J’ is a set in (Y, the inverse image 
of A’ is T MW 2 [o € Q: Tw € A']. It is easily checked that each of these statements is 
equivalent to the next: w € Q — T -1⁄', o € T -1⁄', Tw € A', Tw ED! — A, o e TO 
— A'). Therefore, Q — T^ '4' = T^ (Q' — A’). Simple considerations of this kind show 
that U T A, = T!) and (4T '4,=T 00 4), and that PAS SO 
implies T !4' A T !B' = Ø (the reverse implication is false unless TQ = (Y). 

If f maps Q into another space, f(w) is the value of the function f at an 
unspecified value of the argument w. The function f itself (the rule defining the 
mapping) is sometimes denoted f(-). This is especially convenient for a function 
f(@,t) of two arguments: For each fixed t, f(-,t) denotes the function on Q with 
value f(@,t) at w. 


A8. The axiom of choice. Suppose that [A,: 0 € ©] is a decomposition of Q into 
nonempty sets. The axiom of choice says that there exists a set (at least one set) C 
that contains exactly one point from each Aç: C ñ A, is a singleton for each 0 in ©. 
The existence of such sets C is assumed in “everyday” mathematics, and the axiom of 
choice may even seem to be simply true. A careful treatment of set theory, however, is 
based on an explicit list of such axioms and a study of the relationships between them; 
see Harwos ; or DUDLEY. 

A few of the problems require Zorn's lemma, which is equivalent to the axiom of 
choice; see DUDLEY or KAPLANSKY. 


The Real Line 


A9. The real line is denoted by R'; x V y = max(x, y) and x A y = min(x, y). For 
real x, [x] is the integer part of x, and sgn x is +1, 0, or — 1 as x is positive, 0, or 
negative. It is convenient to be explicit about open, closed, and half-open intervals: 


(a,b) [x a «x «b], 
[a,b] - [x: a xx xb], 
(a, b]  [x: a «x xb], 
[a,b) 2 [x: a xx «b]. 


A10. Of course x, — x means lim, x, = x: x, 1 x means x, $x4:s ^^^ and x, x; 
X, | Xx means x, > x4z ::: and x, x. 

A sequence (x,) is bounded if and only if every subsequence (x, J contains a further 
subsequence {x,,,.,) that converges to some x: lim; Xan TA If (x,) is not bounded, 
then for each k there is an n, for which Ix, |» k; no subsequence of {x4 can 
converge. The implication in the other direction is a simple consequence of the fact 
that every bounded sequence contains a convergent subsequence. 

If (x,) is bounded, and if each subsequence that converges at all converges to x, then 
lim, x, =x. If x, does not converge to x, then |x,, —x|> for some positive e and 
some increasing sequence {n,} of integers; some subsequence of tx, converges, but 
the limit cannot be x. 


All. A set G is defined as open if for each x in G there is an open interval J such 
that x€ I CG. A set F is defined as closed if F* is open. The interior of A, denoted 


538 APPENDIX 


A^, consists of the x in A for which there exists an open interval I such that 
x € | CA. The closure of A, denoted A`, consists of the x for which there exists a 
sequence (x,) in A with x, >x. The boundary of A is dA =A - A^. The basic facts 
of real analysis are assumed known: A is open if and only if A = 4°; A is closed if 
and only if A = Á ; A is closed if and only if it contains all limits of sequences in it; x 
lies in ðA if and only if there is a sequence (x,) in A and a sequence (y,) in A‘ such 
that x, >x and y, >x; and so on. 


A12. Every open set G on the line is a countable, disjoint union of open intervals. 
To see this, define points x and y of G to be equivalent if x < y and [x, y] C G or 
y xx and [y, x] C G. This is an equivalence relation. Each equivalence class is an 
interval, and since G is open, each is in fact an open interval. Thus G is a disjoint 
union of open (nonempty) intervals, and there can be only countably many of them, 
since each contains a rational. 


A13. The simplest form of the  Heine-Borel theorem says that if [a,b] C 
U t -i(a4, bp), then [a,b] CU 2.,(a,, b,] for some n. A set A is defined to be 
compact if each cover of it by open sets has a finite subcover—that is, if [G,: 0 € 9] 
covers A and each G, is open, then some finite subcollection lGa G, } covers A. 
Equivalent to the Heine—Borel theorem is the assertion that a bounded, closed set is 
compact. Also equivalent is the assertion that every bounded sequence of real 
numbers has a convergent subsequence. 


Al4. The diagonal method. From this last fact follows one of the basic principles of 
analysis. 


Theorem. Suppose that each row of the array 


Caen AS 


(1) 22) 270-0 MUN 


is a bounded sequence of real numbers. Then there exists an increasing sequence 
n;,5,... of integers such that the limit lim, x, ,, exists for r —1,2,.... 


PRoor. From the first row, select a convergent subsequence 
(2) Fima ioa Shu D 


here (n, ,) is an increasing sequence of integers and lim, X4, 4,, €xists. Look next at 
the second row of (1) along the sequence n, ,,n, »,...: i 


(3) X2 any Xa nan X2,n4, 42777 . 


As a subsequence of the second row of (1), (3) is bounded. Select from it a convergent 
subsequence 


*2, 51.1? * 2.0359 X2, 3327 **? 


here {n,,} is an increasing sequence of integers, a subsequence of (n, ,}, and 
lim, x; ,, , exists. i 


APPENDIX 539 


Continue inductively in the same way. This gives an array 


"uy "$$ ^ "9 


(4) nš) Mao mna 


with three properties: (i) Each row of (4) is an increasing sequence of integers. (ii) 
The rth row is a subsequence of the (r — 1)st. (iii) For each r, lim, x, „, , exists. Thus 


< ^ 
(5) X, n, p Srna? Xr,n, atis 


is a convergent subsequence of the rth row of (1). 

Put n, =n, ,. Since each row of (4) is increasing and is contained in the preceding 
TOW, ni n2, n,,... is an increasing sequence of integers. Furthermore, 
N, Ayai yaz- IS a subsequence of the rth row of (4). Thus x, no X, , s Xp 5: 
is a subsequence of (5) and is therefore convergent. Thus lim, x, „, does exist. a 


Since {n,} is the diagonal of the array (4), application of this theorem is called the 
diagonal method. 


A15. The set A is by definition dense in the set B if for each x in B and each open 
interval J containing x, J meets A. This is the same thing as requiring B CA . The 
set E is by definition nowhere dense if each open interval / contains some open 
interval J that does not meet E. This makes sense: If I contains an open interval J 
that does not meet E, then E is not dense in I; the definition requires that E be 
dense in no interval /. 

A set A is defined to be perfect if it is closed and for each x in A and positive e, 
there is a y in A such that 0 « |x — y| < e. An equivalent requirement is that A be 
closed and for each x in A there exist a sequence (x,) in A such that x, #x and 
x, — x. The Cantor set is uncountable, nowhere dense, and perfect. 

A set that is nowhere dense is in a sense small. If A is a countable union of sets 
each of which is nowhere dense, then A is said to be of the first category. This is a 
weaker notion of smallness. A set that is not of the first category is said to be of the 
second category. 


Euclidean k-Space 


A16. Euclidean space of dimension k is denoted R“. Points (a,,...,a,) and 
(b,,...,b,) determine open, closed, and half-open rectangles in R^: 


[%; re yl «bi, lm TATE 
[x: aya, S bii m looo d] 
[xv arn, sb lm NT 


A rectangle (without a qualifying adjective) is in this book a set of this last form. 
The Euclidean distance (1; ,(x;— yj))?)* between x-(x,...,x,) and y= 


I 


(y,,..., y4) is denoted by |x — yl. 


A17. All the concepts in A11 carry over to R^: simply take the J there to be an open 
rectangle in R^. The definition of compact set also carries over word for word, and 
the Heine-Borel theorem in R“ says that a closed, bounded set is compact. 


S40 APPENDIX 


Analysis 


A18. The standard Landau notation is used. Suppose that (x,) and (y,) are real 
sequences and y, > 0. Then x, = O( y,) means x, /y, is bounded; x, = o(y,) means 
x,/Y, 20; x, y, means x,/y, > l; x, xy, means x,/y, and y,/x, are both 
bounded. To write x, =z, + O(y,), for example, means that x, =z,  u, for some 
(u,) satisfying u, = O( y,)—that is, that x, — z, = O(y,). 


A19. A difference equation. Suppose that a and b are integers and a < b. Suppose 
that x, is defined for a < n <b and satisfies 


(6) X, PX ui Fa- ASARD; 


where p and q are positive and p +q = 1. The general solution of this difference 
equation has the form 


(7) i. es A * B(q/p) fora <n <b if p#q, 
: A + Bn fora<n<b if p=q. 


That (7) always solves (6) is easily checked. Suppose the values x,, and x,, are given, 
where a <n, <n, x b. If p +q, the system 


APB) x. A+B(q/p) —x,, 
can always be solved for A and B. If p = q, the system 


A + Bn, =x,,, A+ Bn, =x, 


can always be solved. Take n, =a and n; = a + 1; the corresponding A and B satisfy 
(7) for n =a and for n =a +1, and it follows by induction that (7) holds for all n. 
Thus any solution of (6) can indeed be put in the form (7). Furthermore, the equation 
(6) and any pair of values x, and x,, (n, * n) suffice to determine all the x,. 

If x, is defined for a xn <= and satisfies (6) for a <n «o, then there are 
constants A and B such that (7) holds for a < n < o. 


A20. Cauchy’s equation. 


Theorem. Let f be a real function on (0,o»), and suppose that f satisfies Cauchy's 
equation: f(x - y) — f(x) + f(y) for x, y > 0. If there is some interval on which f is 
bounded above, then f(x) — xf(1) for x » 0. 


PRoor. The problem is to prove that g(x)- f(x) —xf(1) vanishes identically. 
Clearly, g(1) = 0, and g satisfies Cauchy's equation and on some interval is bounded 
above. By induction, g(nx)=ng(x); hence ng(m/n) = g(m) = mg(1) = 0, so that 
g(r)= 0 for positive rational r. Suppose that g(x) * 0 for some xo. If g(x) < 0, 
then g(ry — x9) = —g(xq) > 0 for rational r; > xo. It is thus no restriction to assume 
that g(x9) » 0. Let I be an open interval in which g is bounded above. Given a 
number M, choose n so that ng(x,) > M, and then choose a rational r so that nx + r 
lies in T. If r> 0, then g(r + nx) = g(r) + g(nxo) = g(nxo) =ng(x,). If r < 0, then 
ng(xo) = g(nxy) = g((—r) + (nx +r) =g(—r)+g(nxo + r)=g(nxo +r). In either 


APPENDIX 541 


case, g(nxo) +r)=ng(xo); of course this is trivial if r—0. Since g(nxo+r)= 
ng(xX,) > M and M was arbitrary, g is not bounded above in J, a contradiction. m 


Obviously, the same proof works if f is bounded below in some interval. 


Corollary. Let U be a real function on (0, ©) and suppose that U(x + y) = U(x)U(y) 
for x, y > 0. Suppose further that there is some interual on which U is bounded aboue. 
Then either U(x) = 0 for x > 0, or else there is an A such that U(x) — e^* for x > 0. 


Proor. Since U(x) = U?(x/2), U is nonnegative. If U(x) = 0, then U(x /2")=0 
and so U vanishes at points arbitrarily near 0. If U vanishes at a point, it must by the 
functional equation vanish everywhere to the right of that point. Hence U is 
identically 0 or else everywhere positive. 

In the latter case, the theorem applies to f(x) — log U(x), this function being 
bounded above in some interval, and so f(x) = Ax for A = log U(1). a 


A21. A number-theoretic fact. 


Theorem. Suppose that M is a set of positive integers closed under addition and that 
M has greatest common divisor 1. Then M contains all integers exceeding some ng. 


Proor. Let M, consist of all the integers m, —m, and m — m' with m and m in 
M. Then M, is closed under addition and subtraction (it is a subgroup of the group of 
integers). Let d be the smallest positive element of M,. If n € M,, write n = qd +r, 
where 0 < r < d. Since r =n — qd lies in M,, r must actually be 0. Thus M, consists 
of the multiples of d. Since d divides all the integers in M, and hence all the integers 
In M, and since M has greatest common divisor 1, d = 1. Thus M, contains all the 
Integers. 

Write 1 =m — m' with m and m' in M (if 1 itself is in M, the proof is easy), and 
take ng = (m + m')*. Given n > ng, write n = q(m + m) + r, where 0 < r «m + m. 
From n > ng 2 (r + 1Xm +m) follows q = (n — r)/(m + m)» r. But n= 
qim +m')+ r(m — m') 2 (q + r)m + (q — r)m', and since qt+tr>q-r>0, n lies 
in M. EJ 


A22. One- and two-sided derivatives. 


Theorem. Suppose that f and g are continuous on [0,~) and g is the right-hand 


derivative of f on (0,o»): f * (t) = g(t) for t > 0. Then f*(0) = g(0) as well, and g is the 
two-sided derivative of f on (0, oo). 


PRoor. It suffices to show that F(t) = f(t) — f(0) — fég(s)ds vanishes for t > 0. 
By assumption, F is continuous on [0,99 and Ft(t)=0 for t» 0. Suppose that 
F(t9) > F(t,), where 0 « ty « t. Then G(t) = F(t) — (t — toX FG) — F(to))/(t, — to) 
IS continuous on [0, o»), G(to) = G(t,), and G t (t) > 0 on (0,9). But then the maximum 
of G over [to, t ,] must occur at some interior point; since G * < 0 at a local maximum, 
this is impossible. Similarly F(t) < F(t,) is impossible. Thus F is constant over (0, o) 
and by continuity is constant over [0, œ). Since F(0) = 0, F vanishes on [0, oc). a 


A23. A differential equation. The equation f'(t) =Af(t) + g(t) (t > 0; g continuous) 
has the particular solution f(t) = e^'[/g(s)e ^'ds; for an arbitrary solution f 
(f(t) —fó())e-4! has derivative 0 and hence equals f(0) identically. All solutions 
thus have the form f(t) = e^'[f(0) + fig(s)e~4 ds]. 


542 APPENDIX 


A24. A trigonometric identity. If z # 1 and z = 0, then 


l 21 i pall — m Marl 
E kaya Y pragad = =: z , 
1 —z 1-7 
k=-l k=0 
and hence 
—] l m — m 
T Yyprpey sr agi p 
IE Je | =s: 1-47 
I=0 k--I 1-0 


begin ah oz oie 1800 n 


G-590-77)^ Grim 
Take z = e!*. If x is not an integral multiple of 27, then 


m-i l 
LIS 
l=0 k=-l 


If x = 27n, the left-hand side here is m2, which is the limit of the right-hand side as 
x — 2707. 


Infinite Series 


A25.  Nonnegative series. Suppose x), x,),... are nonnegative. If E is a finite set of 
integers, then E C (1,2,..., n} for some n, so that by nonnegativity X, & cx, < El Xk. 
The set of partial sums 2; _ x, thus has the same supremum as the larger set of sums 
Y, e px, CE finite). Therefore, the nonnegative series 17, x, converges if and only if 
the sums X, e cx, for finite E are bounded, in which case the sum is the supremum: 


1 -1Xy = SUPE ERE 


A26. Dirichlet's theorem. Since the supremum in A25 is invariant under permuta- 
tions, so is 1; ,x,: If the x, are nonnegative and y,=x,,) for some one-to-one 
map f of the positive integers onto themselves, then L,x, and L,y, diverge or 
converge together and in the latter case have the same sum. 


A27. Double series. Suppose that x;;, i,j — 1,2,..., are nonnegative. The ith row 
gives a series L;x;j, and if each of these converges, one can form the series Y,Y;x;;. 
Let the terms x;; be arranged in some order as a single infinite series Y;,x;;; by 
Dirichlet's theorem, the sum is the same whatever order is used. 

Suppose each Ł,x;; converges and Y;Y;x;; converges. If E is a finite set of the 
pairs (i, j), there is an n for which X; c £x; S >, E s xi S E; 4,3; x; S Ej Yjx;j 
hence 2;;x;; converges and has sum at most Y;X;x;;. On the other hand, if Yu Xi; 
converges, then Lj < PE s xi; < Y;jx;;; letting n — and then m — o shows that 
each Xjx;; converges and that Y;Y;x;; < £,;x,;. Therefore, in the nonnegative case, 
Ł;jx;; converges if and only if the L;x,; all converge and Y;Yx;; converges, in which 
case X; x; = P.E xii. 

By symmetry, Z;;x;; = Ł;Ł;x;; Thus the order of summation can be reversed in a 
nonnegative double series: $E; j ^ Ej; 


A28. The Weierstrass M-test. 


APPENDIX 543 


Theorem. Suppose that lim, x,, =x, for each k and that |x,,|<M,, where 
>, M, < o. Then >, x, and all the Y, x,, converge, and lim, X. x,, = Ly x,. 


Proor. The series of course converge absolutely, since X,M, «o. Now 
Ly tnk — Rig Rel S dep a UTE ih 2Y., ky M,. Given e, choose Kk, so that 
Lk > K M, < e / 3, and then choose n, so that n > n, implies |x,, —x,| < e / 3k, for 
k € kp. Then n > n, implies IE; x,, — X, x,l« e. 


A29. Power series. The principal fact needed is this: If f(x) = X a, x^ converges 
in the range |x| € r, then it is differentiable there and 


(8) fo s, i, 
k-1 


For a simple proof, choose r, and r, so that |x| «& ro <r; <r. If |h| < rg —|x|, so that 
x + h| < rg, then the mean-value theorem gives (here 0 < 0, < 1) 


+h)“ —x* u 
(9) ERO E > = = jak -|k(x 6,5)! = kx*-! | < 2kr- !. 


Since 2krj !/rf goes to 0, it is bounded by some M, and if M, =|a,|-Mr*, then 
>, M, < « and |a,| times the left member of (9) is at most M, for |h| < ry — |xl. By 
the M-test [A28] (applied with h — 0 instead of n > oo), 


Hence (8). 
Repeated application of (8) gives 


f x) = Y k(k —1):-- (k 2j * 1)a, x* 3. 
k-j 


For x = 0, this is a; = f 0X0) /j!, the formula for the coefficients in a Taylor series. 


This shows in particular that the values of f(x) for |x| < r determine the coefficients 
Ap- 


A30. Cesàro averages. If x, >x, then n! Y. ix, >x. To prove this, let M bound 
lx,|, and given e, choose kọ so that |x —x,| « e/2 for k z ko. If n» ko and 
n> 4koyM /e, then 


A31. Dyadic expansions. Define a mapping T of the unit interval Q = (0, 1] into itself 
by 


20 if0<w<}, 
13.33 q: Hp a, < q 


544 APPENDIX 


Define a function d, on 2 by 


i | [0 if0cox7, 
T if 1«w «1, 
and let d(w) = d(T' 'w). Then 
^ d. " d, 
(10) Y Ae) <o < Y Ke) + 


for all w € e and n > 1. To verify this for n = 1, check the cases w < $ and o > š 
separately. Suppose that (10) holds for a particular n and for all o. Replace o by Tw 
in (10) and use the fact that d(Tw) — d;, (v); separate consideration of the cases 
w < 5 and e > + now shows that (10) holds with n + 1 in place of n. — — 

Thus (10) holds for all n and o, and it follows that w = L7_,d,(w)/2'. This gives 
the dyadic representation of w. If d(w)=0 for all i» n, then w = Yr_ d, (o) /2', 
which contradicts the left-hand inequality in (10). Thus the expansion does not 
terminate in 0’s. 


Convex Functions 
A32. A function e on an open interval 7 (bounded or unbounded) is convex if 
(11) e(tx + (1—1)y) <te(x) + (1 - t)e(y) 
for x, y € I and 0 < t < 1. From this it follows by induction that 
n n 

(12) of 2 px < Y pe(x) 

i=1 i=l 
if the x; lie in Z and the p; are nonnegative and add to 1. 


If has a continuous, nondecreasing derivative q' on I, then q is convex. Indeed, 
if a < b < c, the average of q' over (a, b) is at most the average over (b, c): 


eL = et) m r= (e) as <¢'(b) < 2 G) as 
_ 9(c) — e(5) 
I c—b 
The inequality between the extreme terms here reduces to 
(13) (c - a)e(b) x (c - b)e(a) + (5 - a)e(c), 


which is (11) with x =a, y =c, t 2 (c — b)/€c — a). 


A33. Geometrically, (11) means that the point B in Figure (i) lies on or below the 
chord AC. But then slope AB < slope AC; algebraically, this is (@(b) — o(a))/(b — a) 
< (gc) — (a) /Cc — a), which is the same as (13). As B moves to A from the right, 


APPENDIX 545 


G 
Y 
D F x 
A 
- E 
—À_— Va. — A 
a b c # y z 


(i) (ii) (iii) (iv) 


slope AB is thus nonincreasing and hence has a limit. In other words, o has a 
right-hand derivative g*. Figure (ii) shows that slope DE < slope EF < slope FG. Let 
E move to D from the right and let G move to F from the right: The right-hand 
derivative at D is at most that at F, and o* is nondecreasing. Since the slope of XY 
in Figure (iii) is at least as great as the right-hand derivative at X, the curve to the 
right of X lies on or above the line through X with slope q * (x): 


(14) e(y)zo(x)-*(y-x)e" (x), ye 
Figure (iii) also makes it clear that ¢ is right-continuous. 

Similarly, e has a nondecreasing left-hand derivative o ^ and is left-continuous. 
Since slope AB < slope BC in Figure (i), » (b) x e*(b). Since clearly e *(b) « c 
and —o «9 (b), p* and @ are finite. Finally, (14) and its right-sided analogue 


show that the curve lies on or above each line through Z in Figure (iv) having slope 
between e (z) and e t (z): 


(15) e(x) s e(z)^c m(x—2). 1e (a) 2 


This is a support line. 


Some Multivariable Calculus 


A34. Suppose that U is an open set in R^ and T: U > R* is continuously differen- 
tiable; let D, = [t;(x)] and J(x) = det D, be the Jacobian matrix and determinant, as 


in Theorem 17.2. Let Q be a closed rectangle in U. 
Theorem. Jf It x) — t (x)| <a for x, x' € Q^. and all i, j, then 
(16) | Tx’ — Tx — D,(x' - x)| < k?alx' — xl, x, X’ EQ. 


Before proceeding to the proof, note that, since the t;j are continuous, « can be 
taken to go to 0 as Q contracts to the point x. In other words, (16) implies 


Tx = TX = D(x = x 
(17) lim JEE 2 T& - p E - 3)| 70. 
x'—x Ix — xl 


This shows that D, acts as a multivariable derivative. Suppose on the other hand that 
(17) holds at x for an initially unspecified matrix D,. Take x! =x; +h and x, = x, for 
| * j, and let h go to 0. It follows that the entires of D, must be the partial derivatives 
t; (x): If (17) holds, then D, must be the Jacobian matrix. 


546 APPENDIX 


T(dQ) 


(v) (vi) 


PRoor or (16). For j=0,1,...,k, let z/ agree with x’ in the first j places and 
with x in the last k — j places. Then z^ = x, zk =x’, and |z/ -z/ !|- Kz/ — zi - 
|x’ — x | (Figure (v)). By the mean-value theorem in one dimension, there is a point 
w on the segment from z/~! to z’ such that t((z/) — t(z/~') = tw" Kz! — zi 1y. 
Since 


(p (z: — E Oe qns tij X) A 


it follows that 
| Tx’ — Tx — D,(x' — x)| 


< Daz!) - (27) - (2-272) 
š Y ju") =F) (2 A 


= Walz ee esito E a 
ij 


A35. The multivariable inverse-function theorem. Let xo be a point of the open set U. 


Theorem. If J(xọ) #0, then there are open sets X and Y, containing xy and 
Yo = Txo, respectively, such that T is a one-to-one map from X onto Y = TX; further, 
T !: Y X is continuously differentiable, and the Jacobian matrix of T atyis D> i 


This is a local theorem. It is not assumed, as in Theorem 17.2, that T is one-to-one 
on U and J(x) never vanishes; but under those additional conditions, TU is open and 
the inverse point mapping is continuously differentiable. To understand the role of 
the condition J(xo) # 0, consider the case where k = 1, xo = 0, and Tx is x? or x? 


Proor. Let Q be a rectangle such that x€ Q^CQ CU and J(x) * 0 for 
x€Q'. As (x,u) ranges over the compact set Q X[u: lul 7 1], |D,ul is bounded 
below by some positive f: 


cxi 


(18) ID, ul> Blu if xe Q ,ueR*. Ip 


APPENDIX 547 


Making Q smaller will ensure that lt (x) — ti (x) < B/2k? for all x and x’ in Q ` 
and all i, j. Then (16) and (18) give, for x, x’ € Q ` 


, 


ITx' - Tx| > | D, (x' - x)| - | Tx' - Tx = D,(x' - x)| 


2 |D,(x' -x)|- iBlx' — x2 18lx' — xl. 


Thus 


2 
(19) [x= y| < S|Tx’ = Ty” -forxx' eg 


This shows that T is one-to-one on Q `. 

Since xo does not lie in the compact set dQ, inf, <3ọ|Tx — Txp|=d > 0. Let Y be 
the open ball with center yọ = 7x and radius d/2 (Figure (vi)). Fix a y in Y. The 
problem is to show that y — 7x for some x in Q^, which means finding an x such that 
e(x)-ly— Tx|* = L(y;—1,(x)? vanishes. By compactness, the minimum of o 
on Q ` is achieved there. If x € ðQ (and y € Y), then 2| y — yol < d <|Tx — yol < |Tx — 
y|+ly — Yol, so that |y — Txol <ly — Tx|. Therefore, p(x,) < g(x) for x € 9Q, and so 
the minimum occurs in Q^ rather than on ðQ. At the minimizing point, dp /dx; = 
-= EX y; — t;(x))t;;(x) = 0, and since D, is nonsingular, it follows that y = Tx: Each y 
in Y is the image under T of some point x in Q^. By (19), this x is unique (although 
it is possible that y — 7z for some z outside Q). 

Let X= Qon T !Y. Then X is open and T is a one-to-one map of X onto Y. 
Now let T^! denote the inverse point transformation on Y. By (19), T `! is continu- 
ous. 

To prove differentiability, consider in Y a fixed point y and a variable point y' 
such that y' — y and y' # y. Let x= T! y and x' = T! y; then x” is a function of y’, 
x' >x, and x' * x. Define v by Tx’ — Tx = D,(x' — x) + v; then v is a function of x’ 
and hence of y’, and |v|/|x' — x| ^ 0 by (17). Apply D; ': D; (Tx' - Tx) =x’ - x + 
Dzv,or T-!y' — T-!y 2 D, (y' - y) - Dy !v. By (18) and (19), 


x 


ys ty D: (a sx). De Oke Dš zl Mul 
|y'—y|l Ix -xl l'=yk d — boul 


The right side goes to 0 as y' y. 

By the remark following (17), the components of D7' must be the partial 
derivatives of the inverse mapping: T^! has Jacobian matrix Dr! at y. The 
components of an inverse matrix vary continuous with the components of the original 
matrix (think for example of the inverse as specified by the cofactors), and so ra 
even continuously differentiable on Y. G 


Continued Fractions 


A36. In designing a planetarium, Christian Huygens confronted this problem: Given 
the ratio x of the periods of two planets, approximate it by the ratio of the periods of 
two linked gears. If one gear has p teeth and the other has q, then the ratio of their 
periods is p/q, so that the problem is to approximate the real number x by the 
rational p/q. Of course x, being empirical, is already rational, but the numerator and 
denominator may be so large that gears with those numbers of teeth are not practical: 
in the approximation p/q, both p and q must be of moderate size. 


548 APPENDIX 


Since the gears play symmetrical roles, there is no more reason to approximate x 
by r=p/q than there is to approximate 1/x by 1/r — q/p. Suppose, to be definite, 
that x and r lie to the /eft of 1. Then 


l 
(20) Ix—rl=xrl>- | š|z 
and the inequality is strict unless x =r: If x < 1, it is better to approximate 1/x and 
then invert the approximation, since that will control both errors. 
For a numerical illustration, approximate x = .127 by rounding it up to r = .13; the 
calculations (to three places) are 


r—* = 13-— 712730095 


(21) bo 
= — > = 7.874 — 7.692 = .182. 


= 


The second error is large. So instead, approximate 1 /x = 7.874 by rounding it down to 
1/r’ = 7.87. Since 1/7.87 = .1271 (to four places), the calculations are 


= 
p 


r’—x = .1271 — .127 = .0001. 


= 7.874 — 7.87 = .004, 


x | — 


(22) 


This time, both errors are small, and the error .0001 in the new approximation to x is 
smaller than the corresponding .003 in (21). It is because x lies to the /eft of 1 that 
inversion improves the accuracy; see (20). 

If this inversion method decreases the error, why not do another inversion in the 
middle, in finding a rational approximation to 1/x? It makes no sense to invert 1/x 
itself, since it lies to the right of 1 (and inversion merely leads back to x anyway). But 
to approximate 1/x — 7.874 is to approximate the fractional part .874, and here a 
second inversion will help, for the same reason the first one does. This suggests 
Huygens's iterative procedure. 

In modern notation, the scheme is this. For x (rational or irrational) in (0, 1), let 
Tx = (1/x) and a (x) - |1/x] be the fractional and integral parts of 1/x; and set 
TO = 0. This defines a mapping of [0, 1) onto itself: 


Jn CA piti san | 
(23) Tx = (i= =|=%-a(x) if0<x<1, 
0 ix 0. 
Then 
24 gs i E if0<x<1 
ne a (x) + Tx 


What (20) says is that replacing Tx on the right in (24) by a good rational approxima- 
tion to it gives an even better rational approximation to x itself. 


APPENDIX 549 


To carry this further requires a convenient notation. For positive variables z, 
define the continued fractions 


1 1 
un*z. és Se 
ELT Ta 
z 
z+ 1r ea [e a Hr, 
zn I 
"UU 


and so on. It is “typographically” clear that 


(25) i[z oe +1]z,=1/[ 2; + ape «> +1] 2,)] 
and 


(26) I]z; ez, He r= az. 


For a formal theory, use (25) as a recursive definition and then prove (26) by 
induction (or vice versa). An infinite continued fraction is defined by 


Mz, dfe to: S ime EET 
n 


provided the limit exists. A continued fraction is simple if the z; are positive integers. 
if 7" x > let a (x) =a (T x) the a (x) aro the partial quotients of x. If x 


and Tx are both positive, then (24) applies to each of them: 


x =1fa,(x) + Tx = 1a x) + 1|a,(x) t T?x. 


If none of the iterates x, Tx,...,T" !x vanishes, then it follows by induction (use 
(26)) that 


(27) x - lax) 5 Marr Marx) 


This is an extension of (24), and the idea following (24) extends as well: a good 
rational approximation to T"x in (27) gives a still better rational approximation to x. 
Even if T"x is approximated very crudely by 0, there results a sharp approximation 


(28) x e1l[ai(x) * ++: +tl1[a Cx) 


to x itself. The right side here is the nth convergent to x, and it goes very rapidly to 
X; see Section 24. 

By the definition (23), x and Tx are both rational or both irrational. For an 
irrational x, therefore, T"x remains forever among the irrationals, and (27) holds for 
all n. If x is rational, on the other hand, T"x remains forever among the rationals, 
and in fact, as the following argument shows, T"x eventually hits 0 and stays there. 


550 APPENDIX 


Suppose that x is a rational in (0, 1): x = d, /do, where 0 < d, < do. If Tx > 0, then 
Tx = {dy /d,} = d, /d,, where 0 < d, < d, because 0 < Tx < 1. (If d, /d, is irreducible, 
so is d, /d,.) If T*x > 0, the argument can be repeated: 


d d d 
(29) dul" ^ Mest, Tye st,  dy5d,»di»d450. 


5 


, 
1 ë 


And so on. Since the d, decrease as long as the T”x remain positive, T”x must 
vanish for some n, and then 7x = 0 for m > n. If n, is the smallest integer for which 
T"x = 0 (n, > 1 if x > 0), then by (27), 


(30) x-1[a(x)- tin (x). 


Thus each positive rational has a representation as a finite simple continued fraction. 
If 0 «x < 1 and Tx = 0, then 1 > x = 1/a (x), so that a (x) > 2. Applied to T"« !x, 
this shows that the a, (x) in (30) must be at least 2. 

Section 24 requires a uniqueness result. Suppose that 


(31) x= Ts ria 
where the a; are positive integers and 
(32) (xe, 0) <t<1, Gael > le 


The last condition rules out a, = 1 and t = 0 (which in the case n = 1 is also ruled out 
by x < 1). It follows from (31) and (32) that 


(33) a,(x) =4a,,...,a,(x) =a,, T"x - t. 

The case n = 1 being easy, suppose the implication holds for n — 1, where n > 2. 
Since 0 < 1/(a, + t) < 1, the induction hypothesis (use (26)) gives a,(x)=a, for 
k <n and T" !x — 1/(a, + t). Now apply the case n — 1 to T"^!x, (If a, = 1 and 


t —0, then a,(x) =a, fork «n—2, a,_\(x)=a,_,+1, and T"~!x =0,) 
Consider now the infinite case. Assume that 


converges, where the a, are positive integers. Then 
(35) a,(x)=a,, T"x-1[a,,,*1[4,.2 t ^**, n21. 


To prove this, let n — in (25): the continued fraction ¢ = 1[a; + 1[ay + --- con- 
verges and x = 1/(a, + t). It follows by induction (use (26)) that 


(36) lJa,>lJa + +1Ja;zl1Fa+1[F, na. 


Hence 0 <x < 1, and the same must be true of t. Therefore, a, and t are the integer 
and fractional parts of 1/x, which proves (35) for n = 1. Apply the same argument to 


APPENDIX 551 


Tx, and continue. The x defined by (34) is irrational: otherwise, T"x = 0 for some n, 
which contradicts (35) and (36). 

Thus the value of an infinite simple continued fraction uniquely determines the 
partial quotients. The same is almost true of finite simple continued fractions. Since 
(31) and (32) imply (33), it follows that if x is given by (30), then any continued 
fraction of n, terms that represents x must indeed match (30) term for term. But, for 
example, ie + 1f5=1[3 +4 1[ 4 & 1] T. This is always possible: replace a, (x) in (30) 
(where a, Go > 2) by a, (x) — 1 + 1| 1. Apart from this ambiguity, the representation 
is utiquea-and the representation (30) that results from repeated application of T to 
a rational x never ends with a partial quotient of 1.‘ 


*See Rockett & SzüÜsz for more on continued fractions. 


Notes on the Problems 


These notes consist of hints, solutions, and references to the literature. As a rule a 
solution is complete in proportion to the frequency with which it is needed for the 
solution of subsequent problems. 


Section 1 


1.1. 


1.3. 


1.4. 


15. 


552 


(a) Each point of the discrete space lies in one of the four sets A, 1445, 
AS NA, A, NAS, AS AS and hence would have probability at most 2 ?; 
continue. 

(b) If, for each i, B; is A; or Af, then B, -+> OB, has probability at most 
Im q z) = exp[— 27 _,a,]. 


(b) Suppose A is trifling and let A” be its closure. Given e choose intervals 
(G b, l k=1,...,m, such that 4 c UZ. (a,,b,] and yz (b, a) <e/2- ii 
X, =a, ym. then A^ c Ix sb. landi Se (b. ax) < 

For the other parts of the problem, consider the set of rationals in (0, 1). 


(a) Cover A,(i) by (r — 1)" intervals of length r~” 

(c) Go to the base r^. Identify the digits in the base r with the keys of the 
typewriter. The monkey is certain eventually to reproduce the eleventh edition 
of the Britannica and even, unhappily, the fifteenth. 


(a) The set A,(1) is itself uncountable, since a point in it is specified by a 
sequence of 0’s and 2’s (excluding the countably many that end in 0’s). 


(b) For sequences u,,...,u,, of 0’s, 1’s, and 2’s, let M, __.u, Consist of the points 
in (0,1] whose nonterminating base-3 expansions start out with those digits. 
Then A,(1) = (0, yi) — UM, ..,,, Where the union extends over n > 1 and the 
sequences u;,...,u, containing 'at least one 1. The set described in part (b) is 
[0,1] — UM? 23 where the union is as before, and this is the closure of A,(1). 
From this representation of C, it is not hard to deduce that it can be defined 
as the set of points in [0,1] that can be written in base 3 without any rs if 
terminating expansions are also allowed. For example, C contains š = .1222 : 
= .2000... because it is possible to avoid 1 in the expansion. 
(c) Given e and an w in C, choose o” in A,(1) within €/2 of w; now define w” 
by changing from 2 to 0 some digit of w’ far enough out that w” differs from w’ 
by at most e/2. 


NOTES ON THE PROBLEMS 553 


1.7. 


1.10. 


The interchange of limit and integral is justified because the series L,r,(w)2~“ 
converges uniformly in «c (integration to the limit is studied systematically 
in Section 16). There is a direct derivation of (1.40): let n — e in sin t= 
2"sine "t*Tigas cos2 "t, which follows by induction from the half-angle for- 
mula. 


(a) Given m and a subinterval (a, b] of (0,1], choose a dyadic interval J in 
(a,b], and then choose in I a dyadic interval J of order n>m such that 
In ^ ts, (w) > ; for « € J. This is possible because to specify J is to specify the 
first n dyadic digits of the points in J; choose the first digits in such a way that 
J C I and take the following ones to be 1, with n so large that n~'s,(w) is near 1 
for o € J. 

(b) A countable union of sets of the first category is also of the first category; 
(0,1] - NU N° would be of the first category if N° were. For Baire's theorem, 
see RoYDEN, p. 139. 


. (a) If x = p. /qo * p/4, then 


| P| |poq —qopl 1 
x= =| = — — =. 
q qoq qoq 


(c) The rational X7.,1/2^ has denominator 2^"? and approximates x to 
within 2/24 * D. 


Section 2 


2.3. 


2.4. 


2,5, 


2.8. 


2.9. 


2.10. 


2.11. 


(b) Let Q consist of four points, and let F consist of the empty set, Q itself, 
and all six of the two-point sets. 


(b) For example, take Q to consist of the integers, and let F, be the o-field 
generated by the singletons (k) with k < n. As a matter of fact, any example in 
which Z, is a proper subclass of Z ,, for all n will do, because it can be 
shown that in this case U , necessarily fails to be a o-field; see A. Broughton 
and B. W. Huff: A comment on unions of sigma-fields, Amer. Math. Monthly, 


84 (1977), 553-554. 


(b) The class in question is certainly contained in f(.o7) and is easily seen to be 
closer under the formation of finite intersections. But (U I 175,4; = 
7 AG, and OFLA; = ULLA; 0 (1121 4;,] has the required form. 
If Z is the smallest class over & closed under the formation of countable 
unions and intersections, clearly %C o'(.o7). To prove the reverse inclusion, first 
show that the class of A such that A44 € X is closed under the formation of 
countable unions and intersections and contains & and hence contains %. 


Note that U,B, € c (U , 5 ). 


(a) Show that the class of A for which 1,(w)=1,(@') is a o-field. See Exam- 
ple 4.8. 


(b) Suppose that F is the o-field of the countable and the cocountable sets in 
Q. Suppose that F is countably generated and Q is uncountable. Show that F 


554 


2.12. 


2.18. 


2.19. 


2.21. 


2.22. 


NOTES ON THE PROBLEMS 


is generated by a countable class of singletons; if (), is the union of these, then 
FY must consist of the sets B and BUDS with Bc Qp, and these do not 
include the singletons in 6, which is uncountable because €) is. 

(c) Let F, consist of the Borel sets in Q = (0, 1], and let Z, consist of the 
countable and the cocountable sets there. 


Suppose that A,, A5,... is an infinite sequence of distinct sets in a o-field F, 
and let consist of the nonempty sets of the form (17..,B,, where B, =A,, or 
B, =A‘, n=1,2,.... Each A, is the union of the “sets it contains, and since 
the A, are distinct, -£ must be infinite. But there are uncountably many distinct 
countable unions of “sets, and they all lie in F. 


For this and the subsequent problems on applications of probability theory to 
arithmetic, the only number theory required is the fundamental theorem of 
arithmetic and its immediate consequences. The other problems on stochastic 
arithmetic are 4.15, 4.16, 5.19, 5.20, 6.16, 18.17, 25.15, 30.9, 30.10, 30.11, and 
30.12. See also Theorem 30.3. 


(b) Let A consist of the even integers, let C, = [m: v, < m € v,,,], and let B 
consist of the even integers in C, U C4U -++ together with the odd integers in 
C,UC,U ::: ; take v, to increase very rapidly with k and consider A N B. 


(c) If c is the least common multiple of a and b, then M, M, = M.. From 
M,c 2 conclude in succession that M; 0AM € Z, M, ---n 
MM (YM; € QJ, f(.4) c 2. By the same sequence of steps, show 


a 


how D on .Z determines D on f C^). 


(d) If B, M, — U,.,M,,, then a € B, and (the inclusion-exclusion formula 


requires only finite additivity) 


D(Bj) 


Choose l, so that, if C, = B,, then D(C,) «2^^-!. If D were a probability 


measure on f(.@), D(Q) < A would follow. See Problem 4.16 for a different 
approach. 


(a) Apply the intermediate-value theorem to the function f(x) = ACA ñ (0, x). 
Me that this even proves part (c) for A (under the assumption that A exists). 
b) If 0 « P(B) « P(A), then either 0< P(B) x IP(A) or 0< P(B — A) 
< P(A). Continue. ë i 

(c) If P(U,H,) €x, choose C so that CCA- U,H, and 0<P(C)<x- 


P(U,H,). If n^! « P(C), then P 
POEM, H+ nen. PRU reni) Gime BU pcs Ho) WB Ma 


(c) If 4_, were a o-field, 4 C.£ would follow. 


Use the fact that, if @,,a@,... is a sequence of ordinals satisfying a, <Q, then 
there exists an ordinal o such that a < Q and a, <a for all n. 


NOTES ON THE PROBLEMS 555 


2.23. Suppose that B€ Ug ¿w j = 12 n n Choose odd integers n; in such a way 


that B, € A and the n; are all distinct; choose .Z;-sets such that 


MCN 


a (tj) 


for n not of the form nj, choose .⁄Z-sets for which Bg (,(C,, ,C,, ,...) is Ø or 
(0,1] as n is odd or even. Then U7.,B; = ®,(C;,C,...). Similarly, B* = 
Ci C;,...) for JZysets C, ifB E U s < Za. The rest of the proof is essen- 
tially the same as before. 


Section 3 


3.1. (a) The finite additivity of P is used in the proof that 7, C.£ and again (via 


3.2. 


33. 


monotonicity; see (2.5)) in the proof of (3.7). The countable additivity of P is 
used (via countable subadditivity; see Theorem 2.1) in the proof of (3.7). 

(b) For a specific example consider U (Om +n !,1] in connection with Prob- 
lem 2.15. But an example is provided by every P that is finitely but not countably 
additive: If P is finitely additive on a field A, and A,, are disjoint sets 
whose union A also lies in Z; then monotonicity (which requires 
finite additivity only) gives L,.,P(A,)=P(U,.,A,) x P(A) and hence 
L, P(A,) x P(A). Countable subadditivity will ensure that there is equality 
here. 

(c) The proof of (3.7) involves the countable subadditivity of P on Yo, which is 
only assumed to be a field (that being the whole point of the theorem). 


(a) Given e, choose Yo-sets A, such that A C U, A, and XP(A,) < P*( A) + 
e; if B-U,A,, then ACB, Be Z and P(B) « P*( A) * e; hence the right 
side of (3.9) is at most P*(A). On the other hand, A CB and B € ¥ imply 
P(A) PB) —P(B). Hence (3:9)-If-A CB,, B, e S= P( B. )< P* CA), Eas 
and B = (1, B,, then ACB, B € Z, and P*( A) — P(B). For (3.10), argue by 
complementation. 

(b) Suppose that P, A4) = P*CA) and chose #sets A, and A, in such a way 
that A, CA CA, and P(A,) = PCA;). Given E, choose an #set B in such a 
way that ECB and P*(E)-P(B) Then P*(ANE)+P*(ASNE) < 
P(A, B) + P(AS OB). Now use (2.7) to bound the last sum by P(B) + P(A, 
— A4) - P*(E). 


First note the general fact that P* agrees with P on AY, if and only if P is 
countably additive there, a condition not satisfied in parts (b) and (e). Using 
Problem 3.2 simplifies the analysis of P* and .4( P*) in the other parts. 

Note in parts (b) and (e) that, if P* and P, are defined by (3.1) and (3.2), 
then, since P*( A) = 0 for all A, (3.4) holds for all A and (3.3) holds for no A. 
Countable additivity thus plays an essential role in Problem 3.2. 


3.6 (c) Split E^ by A: P. (E)=1 —P°(E°s)=1 —P°(A n ES) - P(A'n E) 21— 


3:7. 


P*(AnE*) - P(A‘) = P(A) - P (A — E). 


(b) Apply (3.13): For A € .%, Q(A)  P(HnA) - P,(H* nA) = p? 
POD OC ARS GHE UA S PCA). (HAA) 


(c) If A, and A, are disjoint Fy-sets, then by (3.12), 
P°(HN(A,VA,)) » P(HnA,) TE CHO 


556 


3.14. 


3.18. 


3.19. 


3.20. 


NOTES ON THE PROBLEMS 


Apply (3.13) to the three terms in this equation, successively using A, VAs, Aj; 
and A, for A: 


P (H AA OA.) - P,(H* AA) + P,(H* ñA). 


But for these two equations to hold it is enough that HM A, 1A; = Q in the 
first case and H° NA, A, = Q in the second (replacing A, “by A, NAS 
changes nothing). 


. By using Banach limits (BANAcH, p. 34) one can similarly prove that density D 


on the class 2 (Problem 2.18) extends to a finitely additive probability on the 
class of all subsets of = (1,2,...). 


The argument is based on cardinality. Since the Cantor set C has Lebesgue 
measure 0, 2€ is contained in the class .Z of Lebesgue sets in (0, 1]. But C is 
uncountable: card Z = card(0, 1] < card 24 < card -Z. 


(a) Since the A @r are disjoint Borel sets, X, ACA 9 r) < 1, and so the common 
value À( A) of the ACA @ r) must be 0. Similarly, if A is a Borel set contained in 
some H er, then A( A) = 0. 

(b) If the E ^ (H er) are all Borel sets, they all have Lebesgue measure 0, and 
so E is a Borel set of Lebesgue measure 0. 


(b) Given A, Bis... Án- B, ,, note that their union C, is nowhere dense, 
so that J, contains an interval J, disjoint from C,. Choose in J, disjoint, 
nowhere dense sets A, and B, of positive measure. 


(c) Note that A and B, are disjoint and that A, UB, CG. 


(a) If I. are disjoint open intervals with union G, then b 'A(A) > E, AC) > 
A 2E ACA (eb 1A A). 


Section 4 


4.1. 


4.10. 


4.14. 


Let r be the quantity on the right in (4.30), assumed finite. Suppose that x < r; 
then x< V;.,x, for n> 1 and hence x «x, for some kam x «x, io. 
Suppose that x «x, 1.0.; then x< V? ,x, for n> I: x <r. It follows that 
r=sup[x: x €x, i.o.], which is easily seen to be the supremum of the limit 
points of the sequence. The argument for (4.31) is similar. 


The class .7 is the o-field generated by ¥YU{H) (Problem 2.7(a)). If 
(H N G,) U (H° N G1) =(HMG))U (HS G5), then G,^G' c H^ and 
G; G, CH: con viency now follows because A,(H)=A, (H: )-20. If A 
(Hn G) U (H° n GY”) are disjoint, then G” N 1G" c H° and GY” N G$' De e 
H for m =n, and (hereto (see Bicblemn 2.17) P(U,A,) = 3X(U, G) 
+ 3A(U G9) = F,GA(G%) + 3A(GU?)) = X, P(A,). The intervals with ralio: 
v endo sitis u A e, 


Show as in Problem 1.1(b) that the maximum of P(B, n ::: A B,), where B, is 
A; or Aj, goes to 0. Let A, = [e: E,I4(9)2" <x], nm that P(An A „is 
continuous in x, and proceed as in Problem 2.19(a). 


NOTES ON THE PROBLEMS 557 


4.15. Calculate D(F,) by (2.36) and the inclusion—exclusion formula, and estimate 
P,(F,— F) by subadditivity; now use 0 < P(Fj)) - P(F) = P,(F,— F}. For the 
calculation of the infinite product, see Harpy & WRIGHT, p. 246. 


Section 5 


5.5. (a) If m 20, a > 0, and x > 0, then PL X >a] < PICX + x? > (a + xY] < EICX 
+ x)2]/(e +x)? = (e? + x?)/(a  xY; minimize over x. 


5.8. (b) It is enough to prove that g(t) = f(t(x’, y') + (0 — £Xx, y) is convex in t 


(0 <t <1) for (x, y) and (x, y) in C. If a=x'—x and B —y' — y, then (if 
fa > 0) 


e WE + 2f i; a B iu 


= 7- (re fo) +F- (fis foo - fiadb 20. 


Examples like f(x, y)=y?—2xy show that convexity in each variable sepa- 
rately does not imply convexity. 


5.9. Check (5.39) for f(x, y) = —x!/?y!74, 
5.10. Check (5.39) for f(x, y)= —(x!/? + y1/P)P. 


5.19. For (5.43) use (2.36) and the fundamental theorem of arithmetic: since the p; 
are distinct, the p*' individually divide m if and only if their product does. For 
(5.44) use inclusion—exclusion. For (5.47), use (5.29) (see Problem 5.12)). 


5.20. (a) By (5.47), E,[a,] < X;-1p “<2/p. And, of course, n `! log n! = E,[log] = 
LE, la, llog p. 
(b) Use (5.48) and the fact that E,la, —8,] x X; 5p *. 
(c) By (5.49), 


2n n 
E €» E lis 
n<p<2n n<p<2n 


< 2n( E; [log* ] — E,[log* ]) = O(n). 


Deduce (5.50) by splitting the range of summation by successive powers of 2. 
(d) If K bounds the O(1) terms in (5.51), then 


Y logpzóx Y, p 'loggpzó6x(log0 ! -2K). 


psx 0x «p €x 


(e) For (5.53) use 


log p j log p 
L «&m(x) s ), 1+ — 3 
7à 
M log x pae ZA log x 


< x1⁄2 + 


res 


558 NOTES ON THE PROBLEMS 


By (5.53), m(x)>x'/2 for large x, and hence log (x) = log x and a(x) x 
x /log m(x). Apply this with x =p, and note that 7(p,) = r. 


Section 6 
6.3. Since for given values of X,;(w),...,Xn,~—1@) there are for X,(w) the k 
possible values 0,1,...,k — 1, the mere of values of (X,,,(w),.. t DY is 
n!. Therefore, the map w > (X,,,(@),..., X;,(@)) is one-to-one, and the X "mc 


determine w. It follows that if 0 xx, < fon des ess, iato (bc mois: of 


permutations w satisfying X,;(w) —x;, 1 <i € k, is just (k + 1)(k +2)::: n, so 
that P[X,; =x; ) < =k] = 1/k!. It now follows by induction on k that 


ni 


X,,,..., X,, are independent and P[X,, =x]= k! (0 <x < k). 
Now calculate 
Elide 
gone. 10-0 ee a 
Var[ X,, ] = = s a =(54)- PM 
Var[S,] = 45 D e 2n? E — 5n M 


k=1 
Apply Chebyshev’s inequality. 


6.7. (a) If k2<n<(k+1)’, let a, =k’; if M bounds the |x,l, then 


= => (J). 


— | -nM + + (n-a,)M=2M" 


n 


n 


6.16. From (5.53) and (5.54) it follows that a, — X,n ![n/p| — œ. The left side of 
(6.8) is 


lyn 4- ale JE. 1 
Blase red rm a mar ss o) =o SS 
A ARI] pq p n q n np nq 
Section 7 
7.3. If one grants that there are only countably many effective rules, the result is an 
immediate consequence of the mathematics of this and the preceding sections: 
C is a countable intersection of Asets of measure 1. The argument proves in 


particular the nontrivial fact that collectives exist. 


7.7. If n <7, then W, -W, ,— X, 4, = W, — S, ,, and r is the smallest n for which 
$,.; = W,. Use ‘Op 8) i the Ades of whether the game terminates. Now 


T—1 


I Hot 2, (W ~ S,-4) X, 9 Fo + WAS, ., — 3(S2., Pa 7x 1)). 
k=1 


NOTES ON THE PROBLEMS 559 


7.8. Let x,,...,x; be the initial pattern and put 3)=x,+--: +x;. Define 2, = 
X, ,-W,X, Lo = k, and L, = L, _, — (3X, + 1)/2. Then + is the smallest n 
such that L, <0, and + is by the strong law finite with probability 1 if 
E[3X,, + 1] = 6( p — i)>0. For n <r, X, is the sum of the pattern used to 
determine W, , ,. Since F, — F, ,-X, ,— X, it follows that F, = Fy + Èo — X, 
and F. = Fy + Xo. 


7.9. Observe that E[F, = F,] = El 2 Xie 2 |= Dp a ELX, JPlt < k]. 


Section $ 


8.8. (b) With probability 1 the population either dies out or goes to infinity. If, for 
example, Pyo = 1 — p, 44,47 1/k2, then extinction and explosion each have 
positive probability. 


8.9. To prove that x; = 0 is the only possibility in the persistent case, use Problem 
8.5, or else argue directly: If x; = Lj, ;,PijXj i # ij, and K bounds the |x,|, then 
x; = » -> Pi,- Xj where the sum is over J;,..., j, , distinct from iy, and 
hence |x,| < KP[ X, * iy, k « n] > 0. 


8.13. Let P be the set of i for which >; > 0, let N be the set of i for which 7; < 0, 


and suppose that P and N are both nonempty. For i; € P and jy € N choose n 
so that pí”) > 0. Then 


0 < » L m pi) = i = à ye app) 


JENER JEN JENIEN 
= >, T; e pi « 0. 
IN | jeP 


Transfer from N to P any i for which z; = 0 and use a similar argument. 


8.16. Denote the sets (8.32) and (8.52) by P and by F, respectively. Since F C P, 
gcd P « gcd F. The reverse inequality follows from the fact that each integer in 
P is a sum of integers in F. 


8.17. Consider the chain with states 0,1,... and a and transition probabilities 


Po; = fi, i for j20, pog=1—f, p;; =1 for ix 1, and p,,—1 (a is an 
absorbing state). The transition matrix is 


fi fa fs hs lay 
1 0 0 ert 0 
0 ja n0 0 
(p 2n en 1 


i p P n? I .l. Then a x: 
discard the state e and any states j such that f, = 0 for RA f = 1, 


Theorem 8.8. and apply 


S60 


8.19. 


8.22. 


8.23. 


8.24. 


8.25. 


8.27. 


NOTES ON THE PROBLEMS 


In FELLER, Volume 1, the renewal theorem is proved by purely analytic 
means and is then used as the starting point for the theory of Markov chains. 
Here the procedure is the reverse. 


The transition probabilities are po, = 1 and p;, i, 1 =P Di, i= d, l&i&r; 
the stationary probabilities are u, = ''' =u,=q ug-(r*tq)* !, The chance 
of getting wet is uo p, of which the maximum is 2r + 1 - 2yr(r + 1). For r= 5 
this is .046, the pessimal value of p being .523. Of course, uy p < 1/4r. In more 
reasonable climates fewer umbrellas suffice: if p = .25 and r —3, then up = 
.050; if p —.1 and r= 2, then ugp = .031. At the other end of the scale, if 
p = .8 and r = 3, then u, p = .050; and if p = .9 and r = 2, then up = .043. 


For the last part, consider the chain with state space C,, and transition 
probabilities p;; for i, j € C,, (show that they do add to 1). 


Let C' = $ — (TU C), and take U = TU C' in (8.51). The probability of absorp- 
tion in C is the probability of ever entering it, and for initial states í in T U C’ 
these probabilities are the minimal solution of 


n xb Lo Pig Ls puya) PP UC 
jeT jec ye 


O<y, <1, ieTuCc'. 


Since the states in C' (C' — is possible) are persistent and C is closed, it is 
impossible to move from C' to C. Therefore, in the minimal solution of the 
system above, y; — 0 for i € C'. This gives the system (8.55). It also gives, for the 
minimal solution, Y erPijY; * cp = 0, i€ C'. This makes probabilistic 
sense: for an i in C’, not only is it impossible to move to a j in C, it is 
impossible to move to a j in T for which there is positive probability of 
absorption in C. 


Fix on a state i, and let S, consist of those j for which p(? > 0 for some n 
congruent to v modulo i. Choose k so that pí?- 0; if p? and p(? are 
positive, then ¢ divides m +k and n +k, so that m and n are congruent 


modulo t. The S, are thus well defined. 


Show that Theorem 8.6 applies to the chain with transition probabilities pi. 


(a) From PC = CA follows Pc; = A;c;, from RP = AR follows r;P = À;r,, and 
from RC = [ follows ric; = ó;;. Clearly A" is diagonal and P" = CA"R. Hence 
pj) = D O Rr E LUN Cur; FE AAs 

(b) By Problem 8.26, there are scalars p and y such that r, = pry = p(Tr y... , Ts) 
and c, = yco, where co is the column vector of 1’s. From ric, = 1 follows 
py = 1, and hence A, — c,r, = coro has rows (77,,...,7,). Of course, (8.56) gives 


€ DU rate of convergence. It is useful for numerical work; see CINLAR, pP. 


(c) Suppose all four p;; are positive. Then T; =p5,/( p>, + pix), T5 = Pi2/ (P1 
+ P1), the second eigenvalue is À = 1 —p12 — Pa and 


NOTES ON THE PROBLEMS 561 


S.30. 


8.36. 


Note that for given n and e, A” > 1 —« is possible, which means that the pf? 


are not yet near the ;. In the case of positive p;;, P is always diagonalizable by 


C 1 T3 Re T. T) | 
i uem l —] 


(d) For example, take t = +, 0 < e < t, and 


t t ( 
Pel t bo vs 


=e tae f 


In this case, 0 is an eigenvalue with algebraic multiplicity 2 and geometric 
multiplicity 1. 


Show that a, = 7; and 


Das MN De 
B, a, n(n- 1) mm T,( k)( pf; i) 


o Ene ]-ot 


where p is as in Theorem 8.9. 


The definitions give 


E| f( “ais E «n]f(0) + P[o, = n]f(i + n) 
=e SO) AIG, =n Siren @)) 


= Íl = Dao o japi ad e A a coc): 


and this goes to 1. Since P[r <n=o,]>P([7<n]N[X,>0, k>1)>1-— 
fio > 9, there is an n of the kind required in the last part of the problem. And 
now 


E|f(X,)] < P,[r «n =c, |f (i +n) ard =Ple<n =, ] 
=i = Pir <n =T] fian.o: 


$37. W iz 1, ny, <n, G, i tE, and (5... it n) el, then Plr=n,, 
T =n,] > P[ X, sz Í rna Kk «€ nj] > 0, which is impossible. 

Section 9 

9.3. See BAHADUR. 


9.7. Because of Theorem 9.6 there are for P[ M, => a] bounds of the same order as 


the ones for P[S,, > a] used in the proof of (9. 36). 


S62 


NOTES ON THE PROBLEMS 


Section 10 


10.7. Let u, be counting measure on the o-field of all subsets of a countably infinite 


Q, let u, = 2u,, and let 2 consist of the cofinite sets. Granted the existence 
of Lebesgue measure A on .@!, one can construct another example: let u, = A 
and u, = 22, and let # consist of the half-infinite intervals ( — oo, x]. 

There are similar examples with a field Z} in place of Z. Let Q consist of 
the rationals in (0, 1], let u, be counting measure, let w, — 21, and let Fy 
consist of finite disjoint unions of "intervals" [r € Q: a <r < b]. 


Section 11 


11.4. 


(b) If (f, £] C U, (fy, 8], then (f(w), g(9)] € U ,Cf4Co), f, (0)] for all w, and 
Theorem 1.3 gives g(w)—f(w) x X,(g, (v) —f,(@)). If Am = (Ge — fp ree. 
(g, — f,)) V 0, then h, 10 and g-f x X, Ern ard, The positivity and 
continuity of A now give vé f, g] « X,voC fj, g,]. A similar, easier argument 
shows that X,v« fr, g,] < v Cf, e] if (fp, gk] are disjoint subsets of (fug 


. (b) From (11.7) it follows that [f > 1] & A for f in -Z. Since -Z is linear, 


[f>x]and [f< —x] are in A for f € -Z and x > 0. Since the sets (x, o») and 
(—o, —x) for x > 0 generate .Z!, each f in -Z is measurable a (Fo). Hence 
F=a( Fy). 

It is easy to show that A, is a semiring and is in fact closed under the 
formation of proper differences. It can happen that Q € Fg—for example, in 
the case where Q = (1,2) and.Z consists of the f with f(1) = 0. See Jürgen 
Kindler: A simple proof of the Daniell-Stone representation theorem. Amer. 
Math. Monthly, 90 (1983), 396-397.) 


Section 12 


12.4. (a) If 6, = 0,,, then 0, _,, = 0 and n =m because 6 is irrational. Split G into 


12.5. 


12.6. 


12.8. 


finitely many intervals of length less than e; one of them must contain points 65, 

and 6,,, with 0,,,<9>,,- If k = m — n, then 0 < 0;,, — 0;, = 0,,, O 0;, = 05, < €, 

and the points 65,, for 1 </<|63,'| form a chain in which the distance from 

each to the next is less than e, the first is to the left of e, and the last is to the 

right of 1 — e. 

if If 5; 95. =07,41 © 05, @ 0;,, lies in the subgroup, then s, =s, and 02,41 = 
ny—nj) 


(a) The $ 6 6,, are disjoint, and (2n + 1)v + k = Qn + Dv' + k' with |k|, |k'| < n 
is impossible if v # v’. 

(b) The A 6 @,,,,,), are disjoint, contained in G, and have the same Lebesgue 
measure. 


See Example 2.10 (which applies to any finite measure). 


By Theorem 12,3 and Problem 2.19(b), A contains two disjoint compact sets of 
arbitrarily small positive measure. Construct inductively compact sets K„,...u, 
(each u; is 0 or 1) such that 0 <u(Ky ...ue) €3 " and Ky. yp and A, n 


are disjoint subsets of K,,_,,. Take K = n Uu K... | The Cantor set is 
a special case. pee C |... Uy 


NOTES ON THE PROBLEMS 563 


Section 13 


13.3. If f = Y;x;Il, and A;eT 'F', take Ai, in F' so that A, — T ‘Aj, and set 


¢ = E;x; ly. For the general f measurable T !. 7", there exist simple functions 
dis measurable T '¥', such that f,(w)-f(w) for each w. Choose o,, 
measurable F’, so that f, = o, T. Let C' be the set of o” for which q,(o') has 
a finite limit, and define o(o') — lim, o,(w') for w EC’ and o(»') = 0 for 
w' € C'. Theorem 20.1(ii) is a special case. 


. The class of Borel functions contains the continuous functions and is closed 


under pointwise passages to the limit and hence contains Z°. 

By imitating the proof of the m—A theorem, show that, if f and g lie in Z; 
then so do J tg, fe, f —g, f Vg (note that, for example, [g: f - g € 27] is 
closed under passages to the limit). If f(x) is 1 or 1 — n(x — o) or 0 as x «a 
or axxxa-*n ! or a*n ! xx, then f, is continuous and f(x) ^ 
I» (3. Show that [ A: 1, € 27] is a A-system. Conclude that 2” contains 
all indicators of Borel sets, all simple Borel functions, all Borel functions. 


13.13. Let B — (b,,..., b,), where k «n, E,= C—b; |4, and E= Uf ,E;. Then 


E=C-— UF b; !A. Since y is invariant under rotations, u(E;) = 1 — (A) < 
n !, and hence u(E) < 1. Therefore C — E 2 (17 b; 'A is nonempty. Use 
any 0 in C — E. 


Section 14 


14.3. 


14.4. 


14.5. 


(b) Since u < F(x) is equivalent to e(u) < x, it follows that u < F(g(u)). And 
since F(x) < u is equivalent to x < (u), it follows further that F(g(u) — e) <u 
for positive e. 


(a) If O cu «v «1, then Plu x F( X) «v, XeC]- P[oG0 x X < gv), X e 
C]. If e(u) € C, this is at most P[e(u) < X < e(v)] = F(e(u)— ) — F(g(u) — ) 
= F(g(v) —) — F(e(u)) x v — u; if e(u) EC, it is at most P[g(u) < X < g(v)] 
= F(g(v) —) — F(e(u)) < v — u. Thus P[F(X) €[u,v), Xe C] < A[u,v) if 0 < 
u <v < 1. This is true also for u = 0 (let u 0 and note that P[ F(X) = 0] = 0) 
and for v — 1 (let v 11). The finite disjoint unions of intervals [u,v) in [0, D 
form a field there, and by addition P[F( X) € A, X e C] < X(A) for A in this 
field. By the monotone class theorem, the inequality holds for all Borel sets in 
[0, 1). Since P[F( X) = 1, X € C] = 0, this holds also for A = (1). 


The sufficiency is easy. To prove necessity, choose continuity points x; of F in 
such a way that xo <x, < ++: €x,, P(x;) «e, F(x,)»1—-e, and xj —x;_,< 
e. If n exceeds some no, |F(x;) — F,(x,)| < e /2 for all i. Suppose that x; , < 
x <x; Then F(x) < F(xj) < F(x,) + e/2 < F(x + e) + e/2. Establish a simi- 
lar inequality going the other direction, and give special arguments for the 
cases x € xy and x 7 x,. 


Section 15 


15.1. 


Suppose there is an partition such that Xjsup, flu(A,) <ç. Then 
a; —sup,4 f «o for i in the set / of indices for which MCA) » 0. If a= 
max; a;, then u[f»a]- X,4CA; n lf » aD < XiuCA; n [f 5 aD = 0 Ana 
A, [f > 0] = for i outside the set J of indices for which L(A.) « o e that 
ul f » 0] 7 Eul A n Lf » 0) < X,uCA,) < o. voe 


S64 


NOTES ON THE PROBLEMS 


15.4. Let (Q, Zt, *) be the completion (Problems 3.10 and 10.5) of (Q, F, u). If g 


is measurable F, [f = g] CÀ, À € F, w(A)=0, and H € .Z!, then [fe H]= 
(Asn[feHpu(An[feH) =(ASsn[ge HDUCA n[fe H) lesno 
and hence f is measurable F". 

(a) Since f is measurable Z *, it will be enough to prove that for each (finite) 
5 '-partition (Bj) there is an partition (A;) such that inf, f JuCA) > 
X jlinf, f lu^ (B) and to prove the dual relation for the upper sums. Choose 
(Problem 3.2) Fsets A A, so that A; C B; and u(A,) = u,(B,)= u *(B,). For 
the partition consisting of the A; together with (U ;A;,)°, the lower sum is at 
least £X [inf, f ]uCA;) = inf, flu "CB 

(b) Choose successively finer “partitions {A,,,} in such a way that the corre- 
sponding upper and lower sums differ by at most 1/n*. Let g, and f, have 
values inf, f and sup, f on A,;. Use Markov's inequality—since &(() is 
finite, it may as well be 1—to show that ul f, — g, 2 1/n] < 1/n?, and then use 
the first Borel-Cantelli lemma to show that f, — g, > 0 almost everywhere. 
Take g = lim, g,,. 


Section 16 


4633... 5. — yt f— f 


16.4. (a) By Fatou's lemma, 


ffau- [adu = Í timCf, — a,) dp 
< lim inf | (f, — a) du = lim inf | f, du — fadu 
and 
fodun- [fau = J limb, — f,) du 
< lim inf fo, due fodu — lim sup ffy dy. 
Therefore 


lim sup ff, du < ffdu < lim int ff. dn. 


16.6. For w € A and small enough complex h, 


|f (o, zo +h) - f(o, 29)| = f" ro, 2) dz 


EIU 


<= Ihlg(o, 20). 


16.8. Use the fact that P| du <ap(A) + Áüfiz adf lida. 


NOTES ON THE PROBLEMS 565 


16.9. 


16.10. 


16.12. 


If uCA) < 8 implies /,|f,|du < e for all n, and if «^! sup, f|f,| dii < 6, then 
u[|f,|2z a] <a !f|f,ldu < 6 and hence fr, > alf,ldu «e for all n. For the 
reverse implication adapt the argument in the preceding note. 


(b) Suppose that f, are nonnegative and satisfy condition (ii) and p is 
nonatomic. Choose ó so that “(A)<6 implies f f, dj, <1 for all n. If 
pl f, = ©] > 0, there is an A such that A C[f,, = ©] and 0 < u( A) < ó; but then 
laf, du =œ. Since ul f, = <] = 0, there is an a such that ulf, >a] <ó < 
ul f, >a]. Choose B c [f, =a] in such a way that A —[f, > «]U B satisfies 
uCA) 26. Then aó =an(A)< f f, du <1 and ff,du<1+ap(A)<1+ 
ó !u(Q). 


(b) Suppose that fe. Z and f>0. If f, -(1—n Pf v 0, then f, €.7 and 
f, 1 f. so that v( fn f] = ACf — fn) 1 0. Since v( f, f ] < =, it follows that v[Co, t): 
f(@) = t] = 0. The disjoint union 


MT. i+] i 

B= U [|> s= Ge |*=) 

increases to B, where B c (0, f] and (0, f] = B c [(o, t): f(w) = t]. Therefore 
n2" 


ACf) = (0, f] = lim v(B,) = lim > zug < = M = |fdp. 


i=l 


Section 17 


17.1. 


17.10. 


17.11. 


(a) Let A, be the set of x such that for every ó there are points y and z 
satisfying |y — x| < ó, |z — x| < ô, and | f(y) — f(z)| > e. Show that A, is closed 
and D, is the union of the A. 

(c) Given e and m, choose a partition into intervals T, for which the corre- 
sponding upper and lower sums differ by at most em. By considering those 7; 
whose interiors meet A,, show that em > €A(A,). 

(d) Let M bound |f| and, given e, find an open G such that D,C G and 
MG) < e/M. Take C = [0,1] — G and show by compactness that there is a ó 
such that | f(y) — f(x)| < if x (but perhaps not y) lies in C and |y —xl < à. If 
[0, 1] is decomposed into intervals J; with A(/;) < ô, and if x; € I; let g be the 
function with value f(x,) on I. Let Y denote summation over those ¿ for 
which J, meets C, and let ©” denote summation over the other i. Show that 


< [MG «G0 lax 


< 1/2eA(1)) + 1" 2MA(1,) « 4e. 


MOLE ENEDA 


(c) Do not overlook the possibility that points in (0, 1) — K converge to a point 
in K. 


(b) Apply the bounded convergence theorem to f,(x) = (1 — n dist(x,[s, (]) *. 


(c) The class of Borel sets B in [u,v] for which f = Iç satisfies (17 IRTA 
A-system. i 


S66 


17:512. 


NOTES ON THE PROBLEMS 


(e) Choose simple f, such that 0 « f, 1 f. To (17.8) for f—f,, apply the 
monotone convergence theorem on the right and the dominated convergence 


theorem on the left. 


If g(x) is the distance from x to [a,b], then J = —ng)V 0, Ia, b) and 
f, € -Z; since the continuous functions are measurable 2, it follows that 
F= R' If f.(x)10 for each x, then the compact sets [x: fax) 2 e] decrease 
to Ø and hence one of them is Ø; thus the convergence 1S uniform. 


17.13. The linearity and positivity of A are certainly elementary facts, and for the 
continuity property, note that if 0 € f « e and f vanishes outside [a, b], then 
elementary considerations show that 0 < A(f) < e(5 — a). 

Section 18 

18.2. First, 2X Z is generated by the sets of the forms (x) X X and X 5c (x)»1t.the 
diagonal E lies in 2X "^, then there must be a countable Sin X such that E 
lies in the o-field F generated by the sets of these two forms for x in Se If P 
consists of S“ and the singletons in $, then # is the class of unions of sets in 
the partition [ P, X P;: P,, P, € P]. But E € F is impossible. 

18.3. Consider À X B, where A consists of a single point and B lies outside the 
completion of 4! with respect to A. 

18.17. Put f. =p !log p, and put f, =0 if n is not a prime. In the notation of 
(18.17), F(x) = log x + g(x), where e is bounded because of (5.51). If G(x) = 
— 1/log x, then 

deae d^ x F(t) dt 
pe ae 4 i 
S 2 fri om 
x x * p(t) dt 20 
TER e et +f AO af çC Ce 
og% J tlogt 75 tlog^t "aio pa f 
Section 19 
19.3. See BANACH, p. 34. 


19.4. 


19.5. 


(a) Take f = 0 and f, = 1, 


(b) Take f = 0, and let (f,) be an infinite orthonormal set. Use the fact that 
Lf, £ <ilall’. 


Take f, = Alo, yn and suppose that fs; converges weakly to some f in EX. 
Integrate against the L*—functions sgn f ` J, ,, and conclude that f = 0 almost 


5 heap now integrate against the function identically 1 and get a contra- 
iction. 


Section 20 


20.4. 


Suppose U,,...,U, are independent and uniformly distributed over the unit 
interval, put. V; — 2nU, — n, and let u, be (2n)* times the distribution of 


NOTES ON THE PROBLEMS 567 


20.7. 


20.8. 


20.12. 


20.14. 


20.16. 


20.17. 


(V,,...,V,). Then u,, is supported by Q, =(—n,n]x ::: x (—n,n], and if 
I=(a,,b,)<  : x(a4,b,]CQ,, then m, Gyr (b; — a,). Further, if 
A C Q, C Q,, (n < m), then n,( A) = p, (A). Define A,CA) = lim, MAA On) 


By the argument preceding (8.16), 
PIT, =)... TA = n, | = forfan) w V er on 
For the general initial distribution, average over i. 


(a) Use the 7-A theorem to show that P[(X,,,..., X,,,) € H] is the same for 
all permutations 7. 


(b) Use part (a) and the fact that Y, =r if and only if 7") =n. 


(c) If k <n, then yar if and only if exactly r— 1 enone the integers 
1,...,k —1 precede k in the permutation T”). 


(d) Observe that T? =(t,,...,¢,) and Y,,,=r if and only if T“*+”= 
@,.....%,_95” + 1,%,,...,/f,), and conclude that o(Y,,,) is mdependent of 
o(T”) and hence of o(Y;,..., Y,)—see Problem 20.6. 


If X and Y are independent, then 
P[I(X - Y) - (x & y) «e] 2 P[IX -xl « $e] P[IY— y| « że] 
and 
P[X+Y=x+y]>P[X=x]P[Y=y]. 
The partial-fraction expansion gives 


1 
c,(y —x)c,(x) = E R(A+B+C+D), 


where R = (u? — v2)? + 2(u? v?) y? * y* and 


M onam cor ee eer 
u?4(y-x) u*-t(y-xy)- 
| yw +u? ne 2 yx 
D° dex od v r 


After the fact this can of course be checked mechanically. Integrate over 
[—t,t] and let £ — eo; f! ,Ddx = 0, f', Ads = ü, and /*.(A + C) dx = Cy? + 
ea rh G; e l uniman lo gp? Re, (y). There is a very 


simple proof by characteristic functions; see Problem 26.9. 


See Example 20.1 for the case n = 1, prove by inductive convolution and a 
change of variable that the density must have the form K,x/?^!e-*/2 and 
then from the fact that the density must integrate to 1 deduce the form of K,. 


Show by (20.38) and a change of variable that the left side of (20.48) is some 
constant times the right side; then show that the constant must be 1. 


S68 


NOTES ON THE PROBLEMS 


20.20. (a) Given e choose M so that P[IXI» M]«« and PiYi? Mas nu 
then choose ë so that [|x||y|< M, |x —x'| «8, and ly —y'1« imply that 
fC’, y) — f(x, y)| < e. Note that P[|f(X,,Y,) - f(X, V2 e] < 2e + PIX, — 
X|>6]+ P[IY, — Y|> 6]. 

20.23. Take, for example, independent X, assuming the values 0 and n with proba- 
bilities 1 —n-! and n !. Estimate the probability that X, = k for some k in 
the range n/2«k <n. 

20.24. (b) For each m split A into 2" sets A,m of probability PCAY/Z"; Arrange all 
the A, in one infinite sequence, and let X, be the indicator of the nth set 
in it. 

20.27. To get the distribution of ®, show by integration that for 0€ $ < 2r, the 
intersection with the unit ball of the (x4, x5, x4)-set where 0 <x, «(xj + 
x2)? tan has volume įr sin ¢. 

Section 21 

21.5. Consider ET, A random variable is finite with probability 1 if (but not only if) 
it is integrable. 

21.6. Calculate (cx dF(x) = [Efè dy dF(x) = fof, dF(x) dy. 

21.8. (a) Write PY XS Dx s zy U dR y y t GG 

21.10. (a) The most important dependent uncorrelated random variables are the 
trigonometric functions—the random variables sin27nw and cos27nw on the 
unit interval with Lebesgue measure. See Problem 19.8. 

21.13. Use Fubini’s theorem; see (20.29) and (20.30). 

21.14. Even if X= —Y is not integrable, X + Y=0 is. Since |Y| x |x| - |x + Y|, 
E[|Y |] = implies that E[|x + Y |] = for each x; use Problem 21.13. See also 
the lemma in Section 28. 

21.21. Use (21.12). 

Section 22 

22.2. For sufficiency, use E[X| Xl] = LEXON. 
22.8. 


(a) Put U- Y, I, «,X& and V-Y,I,.,jX,, so that S, = U — V. Since 
[rzk]- Q-[r € — ilies in o(X,,..., X, it follows that El, .,,X«17 
ELI, >. ELX¢] = Plr > k])ELX?]. Hence E[U] = Xz E[X:]P[+r > k] = 
E[ X} ]E[7]. Treat V the same way. 

(b) To prove E[7] < œ, show that P[r > (a +b)n] < (1 -pf*"y By (7.2.58. 


is b with probability (1 — p*)/(1 — p^*^) and —a with the opposite probability. 
Since E[ X] = p — q, 


569 


NOTES ON THE PROBLEMS 


22.11. 


22.14. 


22.15. 


For each 0, YX,,e'*^(e'?^z)" has the same probabilistic behavior as the original 
series, because the X, + n0 reduced modulo 27 are independent and uni- 
formly distributed. Therefore, the rotation idea in the proof of Theorem 22.9 
carries over. See KAHANE for further results. 


(b) Let A= f !'B and suppose p is a period Or J^ bet m=|x/p)| and 
n =|1/p]. By periodicity, P(A Q[y,y * p] is the same for all y; there- 
fore, |P(A n [0, xD - mPCA (0, pD| p, |PCA)—nP(A (0, pD| <p, and 
|P(A ^ [0, x D - PCA)x| < 2p * [x — m/n| < 3p. Since p can be taken arbitrar- 


ily small, P(A A [0, xp = P( A)x. 


(a) By the inequalities L(s) « M(s) and (22.24), 
B,-(2s) - 1^3L(2s) < 3M(2s) € 3Bo(s). 


For the other inequality, note first that 7(s) is nonincreasing and that R(2s) < 
2L(s) and T(s) < L(s) < M(s). If B,(s) < 1/3, then 


T(6s) T(3s) M(3s) 
Bo(6s) < T= ROS) 1- 2PC 012423) 


B,(s) 
ES Te25 05 < 3B,(s). 


On the other hand, if B,(s) > 1/3, then Bo(6s) x 1 < 3B&(s). In either case, 
Bo(6s) < 3B, (s). 


Section 23 


23.3. 


23.4. 


23.6. 


23.8. 


23.9. 


Note that A, cannot exceed t. If 0 <u <t and v > 0, then P[ A, >u, B, 5 u] = 
HA Mole esp r 


(a) Use (20.37) and the distributions of A, and B,. 
(b) A long interarrival interval has a better chance of covering ¢ than a short 
one does. 


The probability that Nç. — Nç = J is 


ETT EE a*B)  (j*k- 1)! 
Bx KN OLX = 
fe WE D e (qug) ESD 


Let M, be the given process and put g(t) = E[M,]. Since there are no fixed 
discontinuities, g(t) is continuous. Let y(u) = inf{t: u <(t)], and show that 
N, = Ma is an ordinary Poisson process and M» NO 


Let t > © in 


pole cae Nel Det | 
S. SNE N 


zz 


570 


23.11. 


NOTES ON THE PROBLEMS 


Restrict £ in Problem 23.10 to integers. The waiting times are the Zn Of 
Problem 20.7, and account must be taken of the fact that the distribution of Z, 
may differ from that of the other Z,,. 


Section 25 


25.1. 


2S.2. 


25.3. 


25.9. 


25.10. 


25.11. 


25.13. 


25.20. 


(e) Let G be an open set that contains the rationals and satisfies A(G) < 5. 
For k 20,1,...,n — 1, construct a triangle whose base contains k/n and is 
contained in G: make these bases so narrow that they do not overlap, and 
adjust the heights of the triangles so that each has area 1/n. For the nth 
density, piece together these triangular functions, and for the limit density, use 
the function identically 1 over the unit interval. 


By Problem 14.8 it suffices to prove that F,(-,@) = F with probability 1, and 
for this it is enough that F(x, œw) > F(x) with probability 1 for each rational x. 


(b) It can be shown, for example, that (25.14) holds for x, =n!. See Persi 
Diaconis: The distribution of leading digits and uniform distribution mod 1, 
Ann. Prob., 5 (1977), 72-81. 

(c) The first significant digits of numbers drawn at random from empirical 
compilations such as almanacs and engineering handbooks seem approximately 
to follow the limiting distribution in (25.15) rather than the uniform distribu- 
tion over 1,2,...,9. This is sometimes called Benford's law. One explanation is 
that the distribution of the observation X and hence of log, X will be spread 
over a large interval; if log;; X has a reasonably smooth density, it then seems 
plausible that {log,) X) should be approximately uniformly distributed. See 
FELLER, Volume 2, p. 62. 


Use Scheffé's theorem. 

Put fr) = P[ X, = an kó,]6, | for Yn ar kó, <x < Yn a (k ls 1)é,. Construct 
random variables Y, with densities f,, and first prove Y, — X. Show that 
Z, = y, + KY, — y,)/6,]6, has the distribution of X, and that Y, — Z, = 0. 
For a proof of (25.16) see FELLER, Volume 1, Chapter 7. 


(b) Follow the proof of Theorem 25.8, but approximate I, „| instead of 
LE š 


Let X, assume the values n and 0 with probabilities p, = 1/(n log n) and 
Sp 


Section 26 


26.1. 


(b) Let u be the distribution of X. If |e(t)|= 1 and t #0, then g(t) =e’ for 
some a, and 0 = f? (1 — e @)u(dx)= f? 1 — cos t(x — a))u(dx). Since the 
integral vanishes, ~ must confine its mass to the points where the nonnegative 
integrand vanishes, namely to the points x for which t(x — a) = 2mn for some 
integer n. 

(c) The mass of u concentrates at points of the form a + 2mn/t and also at 
points of the form a’ + 2mn/('. If w is positive at two distinct points, it follows 
that £/t' is rational. 


NOTES ON THE PROBLEMS 571 


26.3. 


26.12. 


26.15. 


26.17. 
26.19. 


26.22. 


26.25. 


(a) Let fó(x)= x= !x 2(1 — cos x) be the density corresponding to g(t). If 
Dy = (Sk — 5,4 tk, then EZ py, = 1; since Y? -i p,oo(t/t,) = e(t) (check the 
points (=), g(t) is the characteristic function of the continuous density 
Y, = 1 Prik folti x). 

(b) If lim, „ (t) = 0, approximate o by functions of the kind in part (a), pass 
to the limit, and use the first corollary to the continuity theorem. If « does not 
vanish at infinity, mix in a unit mass at 0. 


On the right in (26.30) replace g(t) by the integral defining it and apply 
Fubini's theorem; the integral average comes to 


sin T( x —a) 


u(a) +f Masaj 


x*a 


Now use the bounded convergence theorem. 


(a) Use (26.4,) to prove that |o,(t + h) — e, (0) < 24, C— a, a) + alhl. 
(b) Use part (a). 


(a) Use the second corollary to the continuity theorem. 
For the Weierstrass approximation theorem, see RupiN,, Theorem 7.32. 


(a) If a, goes to 0 along a subsequence, then |/(t)| = 1; use part (c) of 
Problem 26.1. 


(c) Suppose two subsequences of (a,) converge to a, and a, where 0 < a, < a; 
put 0 = a/a and show that |e(t)| = |e(0“* t)|. 


(d) Observe that 


—1 
b, = —ifet’n—1 | “els, as 
[ ] f 


First do the nonnegative case; then note that if f and g have the same 
coefficients, so do ff +g and g*+ f~. 


Section 27 


27.8. 


27.9. 


27.11. 


27.12. 


27.16. 


By the same reasoning as in Example 27.3, (R, — log n)/ vlog n > N. 


The Lindeberg theorem applies: (S, — n?/4)/ y n?/36 > N. 
Let Y, be X, or 0 according as | X,| < n!⁄2 log n or not. Show that X, = Y, for 


ae n, with probability 1, and that Lyapounov’s theorem (ë = 1) applies to 
the Y,, 


For example, let the distribution of X, be the mixture, with weights 1 — n~? 
and n 2, of the standard normal and Cauchy distributions. 


^ o —u? /2 = AE on ee E 
Write [Jei du = x-19-4 /2 — [ey-2e-"" 72 du. 


S72 NOTES ON THE PROBLEMS 


27.17. For another approach to large-deviation theory, see Mark Pinsky: An elemen- 
tary derivation of Khintchine’s estimate for large deviations, Proc. Amer. 
Math. Soc., 22 (1969), 288—290. 


27.19. (a) Everything comes from (47) If A -[U,...,/,)€ H] and Be 
ONE, s te ape gys 9, THEN 


|P(A OB) — P( A)P(B)| 
< L P([ =i,,.u<k]OB) - P[I; =i,,4 <k]P(B)|, 


where the sum extends over the k-tuples (i, ..., ix) of nonnegative integers in 
H. The summand vanishes if u +i, < k + n for u < k; the remaining terms add 
to at most 20, - PU, >k+n-u]<4/2". 


(b) To show that c? = 6 (see (27.20)), show that /, has mean 1 and variance 2 
and that 


PIT IDE n n 
f lili+n dP = Us di | " Ey a 
[=i] P[h il n) then: 


Section 28 


28.2. (b) Pass to a subsequence along which »,(R') — œ, choose e, so that it 
decreases to 0 and e nbn (R!) > œ, and choose x, so that it increases to o» and 
Eu x I (RI ); consider the f that satisfies f(+x,,) =e, for all n and 

is "defined by linear interpolation in between these points. 


28.4. (a) If all functions (28.12) are characteristic functions, they are all certainly 
infinitely divisible. Since (28.12) is continuous at 0, it need only be exhibited as 
a limit of characteristic functions. If u,„ has density diss. nj + x?) with respect 
to v, then 


exp| iy +i f” ; abn dr) + a (Qe == ix), (de) 


Is a characteristic function and converges to (28.12). It can also be shown that 
every infinitely divisible distribution (no moments required) has characteristic 
function of the form (28.12); see GNEDENKO & KoLMocoRov, p. 76. 


(b) Use (see Problem 18.19) — |t| == !f* (cos tx — Dx"? dx. 


28.14. If X,, X,,... are independent and have distribution function F, then (X, + 
- + X,)/yn also has distribution function F. Apply the central limit theo- 
rem. 


28.15. The characteristic function of Z, is 


; oo Hk 3 
exp — e z(e/^^^—1)  expef  ——— ax 
z m (Ik|/n) 1° ( ) f |x|!** 


= exp| cle" f» Í — cos x a. 


hejse 


NOTES ON THE PROBLEMS 573 


Section 29 


29.1. (a) If f is lower semicontinuous, [x: f(x) » t] is open. If f is positive, which is 
no restriction, then [fdu = fgulf > t] dt š fg lim inf, u,[f > t] dt < 
lim inf, foul f > t] dt = liminf, ffdu,. 

(b) If G is open, then Iç is lower semicontinuous. 

29.7. Let X be the covariance matrix. Let M be an orthogonal matrix such that the 
entries of MXM' are 0 except for the first r diagonal entries, which are 1. If 
Y = MX, then Y has covariance matrix MM’, and so Y = (Y,,...,Y,,0,...,0), 
where Y,,..., Y, are independent and have the standard normal distribution. 
But X] = XY. 

29.8. By Theorem 29.5, X, has asymptotically the centered normal distribution with 
covariances o;;. Put x = ( p1/?,..., p;/?) and show that Xx’ = 0, so that 0 is an 
eigenvalue of X. Show that X y' =y’ if y is perpendicular to x, so that > has 1 
as an eigenvalue of multiplicity k — 1. Use Problem 29.7 together with Theo- 
rem 292 (h(x) = |x|?). 

29.9. (a) Note that n !X7 ,Y2 = 1 and that (X,,,..., X,,) has the same distribu- 
tonas (E... Y, (n TY 2)177; 

Section 30 
30.1. Rescale so that s> — 1, and put L,(e) 2 Y, herons 257. dP. Choose increasing 


30.4. 


30.5. 


30.6. 


, and put M,-u ! for n, <n «€n,,,. 
Then M, —0 and L(M, «Mj. Pu Y, = X. luys Show that 


X, E[Y,,] ^ 0 and X, E[Y7.] > 1, and apply to X,Y,, the central limit theo- 
rem under (30.5). Show that Y, P[ X,, # Y,,] — 0. 


n, so that Lu l) < u ` fo nzn 


Suppose that the moment generating function M, of u, converges to the 
moment generating function M of u in some interval about s. Let v, have 
density e**/M,(s) with respect to w,, and let v have density e** /M(s) with 
respect to u. Then the moment generating function of v, converges to that of 
v in some interval about 0, and hence v, = v. Show that f^ ,f(x)u, (dx) > 
Íf-.fiGx)u(dx) if f is continuous and has bounded support; see Problem 
25.13(b). 


(a) By Hoólder's inequality xt eo « kuraya PTT RN and so 
Y, 0'f |X t;xjl'uCdx)/r! has positive radius of convergence. Now 


k r 
Ll y» 2 p(dx) = Mtp : ti*a(r,,..., r4), 


Jil 


where the summation extends over k-tuples that add to r. Project u to the line 
by the mapping X tx, apply Theorem 30.1, and use the fact that u is 
determined by its values on half-spaces. 


Use the Cramér-Wold idea. 


574 


30.8. 


30.10. 


NOTES ON THE PROBLEMS 
Suppose that k = 2 in (30.30). Then 


M (cos À x) (cos Ax) 


e^ per y SML 72 
= (3 
73415 ; ; ; 

j i M [exp i(A,(2/, — ri) + 43275 = r2)) x]. 
1 


J2 


J, 70 j2=0 


By (26.33) and the independence of A, and A,, the last mean here is 1 if 
2j, — r; = 2j, — r, = 0 and is 0 otherwise. A similar calculation for k = 1 gives 
(30.28). and a similar calculation for general k gives (30.30). The actual form of 
the distribution in (30.29) is unimportant. For (30.31) use the multidimensional 
method of moments (Problem 30.6) and the mapping theorem. For (30.32) use 
the central limit theorem; by (30.28), X, has mean 0 and variance 3. 


If n'/2<m<n and the inequality in (30.33) holds, then loglogn'/* < 
loglog n — e(loglog n)'/?, which implies loglog n < e ? log? 2. For large n the 
probability in (30.33) is thus at most 1/ yn. 


Section 31 


31.1. 


31.3. 


31.9. 


Consider the argument in Example 31.1. Suppose that F has a nonzero 
derivative at x, and let J, be the set of numbers whose base-r expansions 
agree in the first n places with that of x. The analogue of (31.16) is P[ X € 
I, pis € I,] >r !, and the ratio here is one of po,..., p,- If p; *r ! for 
some i, use the second Borel-Cantelli lemma to show that the ratio is p; 
infinitely often except on a set of Lebesgue measure 0. (This last part of the 
argument is unnecessary if r — 2.) 

The argument in Example 31.3 needs no essential change. The analogue of 
(31.17) is 


FIS) po bp Pp rt) 0<i<r-1. 


d 
r RE: 
(b) Take f,;=J,-1,, and f, = F; (fi f.) (1) = Hç is not a Lebesgue set. 


Suppose that A is bounded, define u by u(B)=A(B n A), and let F be the 
corresponding distribution function. It suffices to show that F'(x)= 1 for x in 
A, apart from a set of Lebesgue measure 0. Let C, be the set of x in A for 
which F'(x) < 1 — e. From Theorem 31.4(i) deduce that A(C )«4(C) s(1—- 
€)X(C,) and hence A(C,) = 0. Thus F'(x)» 1-— e almost everywhere on A. 
Obviously, F'(x) « 1. 


31.11. Let A be the set of x in the unit interval for which F'(x) = 0, take a = 0, and 


define A, as in the first part of the proof of Theorem 31.4. Choose n so that 
ACA,) 2 1 — e. Split (1,2,..., n) into the set M of k for which ((k — 1)/n, k/n] 
meets A, and the opposite set N. Prove successively that X, Ü ml F(k/ n) > 
F((k — 1)/n)] Ee, >, e NUFCk/n) — F(k = 1)/n)] > 1-— e, Eren! se > X(A,) 
>1-—e, Fk-ilf(k/n)— f((k — 1)/n) > 2 -2e. 


NOTES ON THE PROBLEMS 575 


31.15. 


31.18. 


31.22. 


nia esa). 


For x fixed, let u, and u, be the pair of successive dyadic rationals of order n 
(u, — u, = 2 ") for which u, <x < v,. Show that 


`.) = fn nook hyemd n—l 
Mt n) = bà AU s, = DEMON 
k=0 r i k=0 


where a, is the left-hand derivative. Since a, (x) = +1 for all x and k, the 
difference ratio cannot have a finite limit. 


Let A be the x-set where (31.35) fails if f is replaced by fy; then A has 
Lebesgue measure 0. Let G be the union of all open sets of «-measure 0; 
represent G as a countable disjoint union of open intervals, and let B be G 
together with any endpoints of zero w~-measure of these intervals. Let D be the 
set of discontinuity points of F. If F(x) € A, x € B, and x € D, then F(x — h) 
< F(x) < F(x +h), F(x + h) > F(x), and 


1 F(x +h) 


E a aCe eID): 


Now x —e < e(F(x)) <x follows from F(x — e) < F(x), and hence g(F(x)) = 
x. If A is Lebesgue measure restricted to (0,1), then u = A@ `!, and (31.36) 
follows by change of variable. But (36.36) is easy if x € D, and hence it holds 
outside B U (D° n F `!A). But (B) = 0 by construction and (D° N F !A)= 0 
by Problem 14.4. 


Section 32 


32.7. 


32.8. 


32.9. 


Define u, and v, as in (32.7) and write v, — v? + v9, where vi” is 
absolutely continuous with respect to u„ and v! is singular with respect to 
jt l Maker) =) vU) and vie Y, vt^» 

Suppose that v, (E) + vE) = v, (E) + v; (E) for all E in F. Choose an $ 
in Z that supports v, and v; and satisfies u(S) —0. Then v,(E)= 
y (E n $°) = y UE $*) + v E (NS e y UE N S°) + v'(E (Se v (E ^ 
$^) = v4 CE). A similar argument shows that v (E) = v; (E). 


(a) Show that @ is closed under the formation of countable unions, choose 
@-sets B, such that u(B,,) > supg u(B) (< œ), and take By = U, B,- 

(b) The same argument. 

(c) Suppose (D) > 0. The maximality of By implies that By U D, contains 
an E such that u( E) > 0 and v(E) < œ. Since B, N EC By € B, (BN E) 0 
G(E) « o rules out v(B)M E) = œ). Therefore, (D; 1 E) » 0 and 
v( Do N E) < =, which contradicts the maximality of Co. 

(d) Take the density to be o on QS. 


Define f and v, as in (32.8), and let f° and v? be the corresponding func- 
tion and measure for F°: v(E) - fcf? du + v; CE) for E € F°, and there 
is an F°-set $^ such that v? (Q — S°)=0 and u(S°)=0. If E € F°, it fol- 
lows that fcf? du = fg s f^ du = fe-s f° d =v°(E—-S°)=v(E-S§°)> 
fe-s fdu = fefdp. 


576 


NOTES ON THE PROBLEMS 


It is instructive to consider the extreme case Z° = (0, Q}, in which »? is 
absolutely continuous with respect to 4^ (provided (0) > 0) and hence v? 
vanishes. 


Section 33 


33.2. 


33.3. 


33.6. 


33.15. 


(a) To prove independence, check the covariance. Now use Example 33.7. 

(b) Use the fact that R and 9 are independent (Example 20.2). 

(c) As the single event [Y =Y]=[X — Y= 0] 2 [0 = 7/4] U [0 = 5m /4] has 
probability 0, the conditional probabilities have no meaning, and strictly 
speaking there is nothing to resolve. But whether it is natural to regard the 
degrees of freedom as one or as two depends on whether the 45° line through 
the origin is regarded as an element of the decomposition of the plane into 45° 
lines or whether it is regarded as the union of two elements of the decomposi- 
tion of the plane into rays from the origin. 

Borel’s paradox can be explained the same way: The equator is an element 
of the decomposition of the sphere into lines of constant latitude; the Green- 
wich meridian is an element of the decomposition of the sphere into great 
circles with common poles. The decomposition matters, which is to say the 
c-field matters. 


(a) If the guard says, “1 is to be executed," then the conditional probability 
that 3 is also to be executed is 1/(1 + p). The “paradox” comes from assuming 
that P must be 1, in which case the conditional probability is indeed i . But if 
p = 3, then the guard does give prisoner 3 some information. 


(b) Here “one” and “other” are undefined, and the problem ignores the 
possibility that you have been introduced to a girl. Let the sample space be 


a B y Ó 
bbo 4^ bgo a gbo 4 880 +, 
= = |== 1—ó 
bey na, SD res EEY snam: 


For example, bgo is the event (probability 8/4) that the older child is a boy, 
the younger is a girl, and the child you have been introduced to is the older; 
and ggy is the event (probability (1 — 8)/ 4) that both children are girls and the 
one you have been introduced to Is the younger. Note that the four sex 
distributions do have probability 4 1. If the child you have been introduced to is 
a boy, then the conditional probability that the other child is also a boy is 
=1/2+p- ». If 8 = 1 and y = 0 (the parents present a son if they have 
i), then p= 5. If B = y (the parents are indifferent), then p=. Any p 
between + and l is possible. 
This problem shows again that one must keep in mind the entire experi- 
ment the sub-o-field Y represents, not just one of the possible outcomes of 
the experiment. 


There is no problem, unless the notation gives rise to the illusion that p( Alx) 
is PCA n[ X = x])/P[ X = x]. 


If N is a standard normal variable, then 


1 


sob) ae re epe 


NOTES ON THE PROBLEMS 577 


Section 34 


34.3. If (X, Y ) takes the values (0,0), (1, — 1), and (1, 1) with probability 4 each, then 
X and Y are dependent but E[Y || X] ^ E[Y] = 0. 
If (X, Y) takes the values ( — 1, 1), (0, — 2), and (1, 1) with probability + each, 
then E[ X] = E[Y] = E[ XY] 2 0 and so E[ XY] - ELX]E[Y], but E[Y || X] = 
Y «0 — E[Y ]. Of course, this is another example of dependent but uncorre- 
lated random variables. 


34.4. First show that /fdP, = f5f dP/P(B) and that P[Bll-27] » 0 on a set of Po- 
measure 1. Let G be the general set in Y. 


(a) Since 
f Pol AlZ]P[ BIA] aP = | PAIS ]s dP = f 15 P Alg ] aP 
G G B 
- P(B) f 1 Po Allg] Po = P(B)Po( 4G) 
= | P[An BI.g]4P, 
G 


it follows that 


P,[ All Z]P[ Bll] = PLA n BII-z] 


holds on a set of P-measure 1. 
(b) If P(A) = PCAI|B,), then 


f Pl AlZ]dP = P(B,) f IP AlZ] dP, = P(B;)P;(A NG) 
GNB; Q 
=f P[Al#v # Jae. 
GOB; 


Therefore, felg P[AllY]dP = felg PIAIIGVY #]dP if C — Gn B,, and of 
course this holds for C = G OB; if j= i. But C's of this form constitute a 
T-system generating YV X, and hence I, PLAIS] — Ig P[ AllZv Z] on a 
set of P-measure 1. Now use the result in part (a). 


34.9. All such results can be proved by imitating the proofs for the unconditional 
case or else by using Theorem 34.5 (for part (c), as generalized in Problem 
34.7). For part (a), it must be shown that it is possible to take the integral 
measurable .$. 


34.10. (a) If Y= X — E[ X|I.Z], then X — E[XIl-£,] - Y EYA], and E[(Y — 
ELY NA, VNA] = ELY 2-45] — E2[Y |Z] < E[Y 2?|ll-Z]. Take expected values. 


34.11. First prove that 


P[A, NAIF] =E[14,P[AsllA2] A]. 


578 


34.16. 


34.17. 


NOTES ON THE PROBLEMS 


From this and (i) deduce (ii). From 
E|r, PL All A] |] = P[ Ail-*; ]P[ As A], 


(ii), and the preceding equation deduce 
f P[Al.Z,]ap = f P[Al.Z,]4P. 
ANA? A104) 


The sets 4, A, form a m-system generating Z. 


(a) Obviously (34.18) implies (34.17). If (34.17) holds, then clearly (34.18) holds 
for X simple. For the general X, choose simple X, such that lim, X, = X and 
| X,| x |X|. Note that 


| XaP-a xar! 
An 


< 


+ (1+lal) E[IX — X;I]; 


J X. P -a | X, aP 


let n — o and then let k > œ. 


(b) If Q € Z, then the class of E satisfying (34.17) is a A-system, and so by 
the 7—A theorem and part (a), (34.18) holds if X is measurable o (4). Since 
A, € a (2), it follows that 


f, xaP- [El Xllo(A)]4P >a f E[ X\lo(P)] aP 
=a [XaP. 


(c) Replace X by XdP)/dP in (34.18). 


(a) The Lindeberg- Lévy theorem. 
(b) Chebyshev's inequality. 

(c) Theorem 25.4. 

(d) Independence of the X,. 

(e) Problem 34.16(b). 

(f) Problem 34.16(c). 


(g) Part (b) here and the e-6 definition of absolute continuity. 
(h) Theorem 25.4 again. 


Section 35 


35.4. 


(b) Let S, be the number of k such that 1 «k «n and Y, = 3. Then 
X, = 3°"/2". Take logarithms and use the strong law of large numbers. 


NOTES ON THE PROBLEMS 579 


35.9. Ia K bound | X, | and the | X, — X, |. Bound | X,| by Kr. Write f, < , X, dP = 


35.17. 


So ws i dP == San. (Qy awa dP — xa ~ i 1X; dP). Transform the last integral 
by the martingale property and reduce the expression to E[ X,] — [-5.Xx41 dP. 
Now 


«K(k* 1)P[r» k] sK(k c k^! f cdP— 0. 


T»5k 


| NNUS 


. (a) By the result in Problem 32.9, X,, X5,... is a supermartingale. Since 


E[| X,l] = E[ X,] < v(Q), Theorem 35.5 applies. 
(b If AEF, then MO, + Z)dP * o,(A) = f[AX,dP + o ( A) 2 v(A) 5 
[4 X, dP + a, CA). Since the Lebesgue decomposition is unique (Problem 32.7), 
Y, +Z,= X, with probability 1. Since X, and Y, converge, so does Z,. If 
AEF, and n> k, then /.Z, dP < o,( A), and by Fatou's lemma, the limit Z 
satisfies /,ZdP < o,( A). This holds for A in U ,.7, and hence (monotone 
class theorem) for A in Z. Choose A so that P(A)=1 and c( A) = 0: 
E[Z] = [(aZdP < o,( A) - 0 

It can happen that c,(Q) = 0 and o,(0) = v(Q) > 0, in which case c, does 
not converge to o, and the X, cannot be integrated to the limit. 


For a very general result, see J. L. Doob: Application of the theory of 
martingales, Le Calcul des Probabilités et ses Applications (Colloques Interna- 
tionaux du Centre de la Recherche Scientifique, Paris, 1949). 


Section 36 


36.5. 


(b) Show by part (a) and Problem 34.18 that f, is the conditional expected 
value of f with respect to the o-field .7,,, generated by the coordinates 
Xn p Xga2:--- . By Theorem 35.9, (36.30) will follow if each set in (1,2, has 
T-measure either 0 or 1, and here the zero—one law applies. 


(c) Show that g, is the conditional a value of f with respect to the 


a-field generated by the coordinates x,,..., x,, and apply Theorem 35.6. 

36.7. Let -Z be the countable set of simple functions Xe; I, for o; rational and 
(A;) a finite decomposition of the unit interval into subiñtervals with rational 
endpoints. Suppose that the X, exist, and choose (Theorem 17.1) Y, in -Z so 
that E[| X, — Y,|] < +. From BUX — — X,|] = 5, conclude that E[|Y, — Y,|] > 0 for 
s= t. But there are only countably many of the Y,. It does no good to replace 
Lebesgue measure by some other measure on the unit interval. 

Section 37 
37.1. If t,,...,t, are in increasing order and t, = 0, then 


mint, j) 


Y K(t; tj) xix = bun 2 (Wate 1) 
Í,j 


l=1 


= Elan Ex) 20 


izl 


S80 NOTES ON THE PROBLEMS 


37.4 (a) Use Problem 36.6(b). 
(b) Let [W;: t > 0] be a Brownian motion on (Q, Z, Po), where W(-,w) € C 
for every wi Define ¢: OAR’ by Z,(E(w)) = Ww). Show that £ is mea- 
surable ¥/@™ and P=P,é-'. If C cA € A’, then PCA) = Pog" A) = 


37.5. Consider W(1) = X7. (W(k/n) — W((k — 1)/n)) for notational convenience. 
Since 


nf w'(z ap f  W?(1) dP > 0, 
IW(1/n)2 e Me IW(1)|> e yn 


the Lindeberg theorem applies. 


37.14. By symmetry, 


p(s.r) =2P|W, > 0, inf (EW y= Svar 


ssucst 


W. and the infimum here are independent because of the Markov property, 
and so by (20.30) (and symmetry again) 


1 
y27s 


p(s.) =2f Pr, <: =s] me eg. 


1 —x2 /2u 1 e x s du dx. 


DUE Z Z= 
0 70 2m u?” 27S 


Reverse the integral, use Je xe de dx = 1/r, and put v = (s/(s + u))1⁄2: 


12s" nas 1⁄2 
PD m mf aes uia d 


22 p dv 


uS 


Bibliography 


Hamos and Saks have been the strongest measure-theoretic and Doos and 
FELLER the strongest probabilistic influences on this book, and the spirit of 
Kac’s small volume has been very important. 


AuBREY: Brief Lives, John Aubrey; ed., O. L. Dick. Seker and Warburg, 
London, 1949. 


BAHADUR: Some Limit Theorems in Statistics, R. R. Bahadur. SIAM, 
Philadelphia, 1971. 


Banacha: Théorie des Opérations Linéaires, S. Banach. Monografje Matem- 
atyczne, Warsaw, 1932. 


BERGER: Statistical Decision Theory, 2nd ed., James O. Berger. Springer- 
Verlag, New York, 1985. 


BHATTACHARYA & WAYMIRE: Stochastic Processes with Applications, Rabi 
N. Bhattacharya and Edward C. Waymire. Wiley, New York, 1990. 


BILLINGSLEY,: Convergence of Probability Measures, Patrick Billingsley. 
Wiley, New York, 1968. 


BILLINGSLEY,: Weak Convergence of Measures: Applications in Probability, 
Patrick Billingsley. SIAM, Philadelphia, 1971. 


BIRKHOFF & Mac LANE: A Survey of Modern Algebra, 4th ed., Garrett 
Birkhoff and Saunders Mac Lane. Macmillan, New York, 1977. 


GiNLAR: Introduction to Stochastic Processes, Erhan Cinlar. Prentice-Hall, 
Englewood Cliffs, New Jersey, 1975. 


CRAMER: Mathematical Methods of Statistics, Harald Cramér. Princeton 
University Press, Princeton, New Jersey, 1946. 


Doos: Stochastic Processes, J. L. Doob. Wiley, New York, 1953. 


DusiNs & Savace: How to Gamble If You Must, Lester E. Dubins and 
Leonard J. Savage. McGraw-Hill, New York, 1965. 


S81 


582 BIBLIOGRAPHY 


Duprey: Real Analysis and Probability, Richard M. Dudley. Wadsworth 
and Brooks, Pacific Grove, California, 1989. 

DvwkiN & YusSHKEVICH: Markov Processes, English ed., Evgenii B. Dynkin 
and Aleksandr A. Yushkevich. Plenum Press, New York, 1969. 

FELLER: An Introduction to Probability Theory and Its Applications, Vol. I. 
3rd ed.. Vol. II, 2nd ed., William Feller. Wiley, New York, 1968, 1971. 

GarAMBos: The Asymptotic Theory of Extreme Order Statistics, Janos 
Galambos. Wiley, New York, 1978. 

GELBAUM & OLMSTED: Counterexamples in Analysis, Bernard R. Gelbaum 
and John M. Olmsted. Holden-Day, San Francisco, 1964. 

GNEDENKO & Korwoconov: Limit Distributions for Sums of Independent 
Random Variables, English ed., B. V. Gnedenko and A. N. Kolmogorov. 
Addison-Wesley, Reading, Massachusetts, 1954. 

Harwos,: Measure Theory, Paul R. Halmos. Van Nostrand, New York, 
1950. 

Harwos,: Naive Set Theory, Paul R. Halmos. Van Nostrand, Princeton, 
1960. 

Harpy: A Course of Pure Mathematics, 9th ed., G. H. Hardy. Macmillan, 
New York, 1946. 

Harpy & WRIGHT: An Introduction to the Theory of Numbers, 4th ed., 
G. H. Hardy and E. M. Wright. Clarendon, Oxford, 1959. 

HAUSDORFF: Set Theory, 2nd English ed., Felix Hausdorff. Chelsea, New 
York, 1962. 

JEcH: Set Theory, Thomas Jech. Academic Press, New York, 1978. 

Kac: Statistical Independence in Probability, Analysis and Number Theory, 
Carus Math. Monogr. 12, Marc Kac. Wiley, New York, 1959. 

KAHANE: Some Random Series of Functions, Jean-Pierre Kahane. Heath, 
Lexington, Massachusetts, 1968. 

KAPLANSKY: Set Theory and Metric Spaces, Irving Kaplansky. Chelsea, New 
York, 1972. 

KARATZES & SHREVE: Brownian Motion and Stochastic Calculus, Ioannis 
Karatzis and Steven E. Shreve. Springer-Verlag, New York, 1988. 

KARLIN & TAYLOR: A First Course in Stochastic Processes, 2nd ed., A 
Second Course in Stochastic Processes, Samuel Karlin and Howard M. 
Taylor. Academic Press, New York, 1975 and 1981. 

Ko_mocorov: Grundbegriffe der Wahrscheinlichkeitsrechnung, Erg. Math., 
Vol. 2, No. 3, A. N. Kolmogorov. Springer-Verlag, Berlin, 1933. 

Lévy: Théorie de l’ Addition des Variables Aléatoires, Paul Lévy. Gauthier- 
Villars, Paris, 1937. 


PROTTER: Stochastic Integration and Differential Equations, Philip Protter. 
Springer-Verlag, 1990. 


BIBLIOGRAPHY 583 


Riesz & Sz.-NaGy: Functional Analysis, English ed., Frigyes Riesz and 
Bela Sz.-Nagy. Unger, New York, 1955. 


Rockett & SzUsz: Continued Fractions, Andrew M. Rockett and Peter 
Szüsz. World Scientific, Singapore, 1992. 


RovpEN: Real Analysis, 2nd ed., H. I. Royden. Macmillan, New York, 1968. 


Rupin,: Principles of Mathematical Analysis, 3rd ed., Walter Rudin. 
McGraw-Hill, New York, 1976. 


Rupin,: Real and Complex Analysis, 2nd ed., Walter Rudin. McGraw-Hill, 
New York, 1974. 


Saks: Theory of the Integral, 2nd rey. ed., Stanislaw Saks. Hafner, New 
York, 1937. 


SpivAK: Calculus on Manifolds, Michael Spivak. W. A. Benjamin, New 
York, 1965. 


Wacon: The Banach-Tarsky Paradox, Stan Wagon. Cambridge University 
Press, 1985. 


List of Symbols 


sgn x, 537 
A—B, 536 
A‘, 536 

A ^ B, 536 
A c B, 536 
Fset, 20 
By, 20 

o (s ), 21 
00923 

4, 22 


(0, 5. P) 23 


A,1 A, 536 
A,, A, 536 


A, 25, 43, 168 


^, 537 
V, 537 
FCA), 33 
P(A), 35 
D(A), 35 
2, 35 

PE 39..47 
Po: 7; 47 
SE 27.311 
2, 4] 
2,4) 

A*. 44 


P(B|A), 51 

lim sup, A,,, 52 
hint; Alps 52 
lit, Ay 52 

I0 53 

ZA PL 287 

R', 537 

R*, 539 

[PX G7 

c (X), 68, 255 

bu, 73, 160, 256 
EUX D 762273 
Var[ X ], 78, 275 
EFL 87 

s, Ca), 93 

T, 99, 133, 464, 508 
pj, 111 

SENI 

a;, 111 

Ti, 124 

M(t), 146, 278, 285 
2k 158 

AN No, 159 
Bev. 1160537 

16 = Evia 160 

ii IGS 

(u*), 165 

A,, 168 

A 

F 175, 177 
A4F,.176 

T. l4. 531 
F/F 182 

F, = F, 191, 327, 378 
dF(x), 228 
Xx y 23) 
ax % 231 
u Xv, 233 


586 


*, 266 

X, > pX, 70, 268, 330 
I| f ll, 249 

Ifl, 241 

L^, 241 

Hn = 4, 327, 378 
X, = X, 329, 378 
X, = a, 331 

dA, 538 

g(t), 342 

fi opos 974 

F,, 414 


LIST OF SYMBOLS 


F,., 414 

v <p, 422 
dv/dp, 423 

v., 424 

Vac» 424 

P[ A||4], 428, 430 
P| All X,, t € T], 433 
E[ XIIA], 445 

RT, 484 

RPT, 485 

W,, 498 


Index 


Here An refers to paragraph n of the Appendix (p. 536); u.v refers to Problem v in Section u, 
or else to a note on it (Notes on the Problems, p. 552); the other references are to pages. Greek 
letters are alphabetized by their Roman equivalents (m for u, and so on). Names in the 


bibliography are not indexed separately. 


Absolute continuity, 413, 422 
Absolutely continuous part, 425 
Absolute moment, 274 
Absorbing state, 112 

Adapted o-fields, 458 

Additive set function, 420 


Additivity: 
countable, 23, 161 
finite, 23, 161 


Admissible, 248, 252 

Affine transformation, 172 
Algebra, 19 

Almost everywhere, 60 

Almost surely, 60 

a-mixing, 363, 29.10 

Aperiodic, 125 

Approximation of measure, 168 
Area over the curve, 79 

Area under the curve, 203 
Asymptotic equipartition property, 91, 144 
Asymptotic relative frequency, 8 
Atom, 271 

Autoregression, 495 

Axiom of choice, 21, 45 


Baire category, 1.10, A15 
Baire function, 13.7 

Banach limits, 3.8, 19.3 
Banach space, 243 
Banach-Tarski paradox, 180 
Bayes estimation, 475 

Bayes risk, 248, 251 

Benford law, 25.3 


Beppo Levi theorem, 16.3 
Bernoulli- Laplace model of diffusion, 112 
Bernoulli shift, 311 

Bernoulli trials, 75 

Bernstein polynomial, 87 

Betting system, 98 

Binary digit, 3 

Binomial distribution, 256 
Blackwell-Rao theorem, 455 
Bold play, 102 

Boole's inequality, 25 

Borel, 9 

Borel—Cantelli lemmas, 59, 60 
Borel function, 183 

Borel normal number theorem, 9 
Borel paradox, 441 

Borel set, 22, 158 

Boundary, A11 

Bounded convergence theorem, 210 
Bounded variation, 415 
Branching process, 461 
Britannica, 552 

Brownian motion, 498 

Burstin's theorem, 22.14 


Canonical measure, 372 
Canonical representation, 372 
Cantelli inequality, 5.5 

Cantelli theorem, 6.6 

Cantor function, 31.2, 31.15 
Cantor set, 1.5 

Cardinality of o-fields, 2.12, 2.22 
Cartesian product, 231 


587 


588 


Category, A15, 1.10 

Cauchy distribution, 20.14, 348 

Cauchy equation, A20, 14.7 

Cavalieri principle, 18.8 

Central limit theorem, 291, 357, 385, 391, 
34.17, 475 

Cesaro averages, A30, 20.23 

Change of variable, 215, 224, 225, 274 

Characteristic function, 342 

Chebyshev inequality, 5, 80, 276 

Chernoff theorem, 151 

Chi-squared distribution, 20.15 

Chi-squared statistic, 29.8 

Circular Lebesgue measure, 13.12, 313 

Class of sets, 18 

Closed set, All 

Closed set of states, 8.21 

Closed support, 12.9 

Closure, All 

Cocountable set, 21 

Cofinite set, 20 

Collective, 109 

Compact, A13 

Complement, Al 

Completely normal number, 6.13 

Complete space or measure, 44, 10.5 

Completion, 3.10, 10.5 

Complex functions, integration of, 218 

Compound Poisson distribution, 28.3 

Compound Poisson process, 32.7 

Concentrated, 161 

Conditional distribution, 439, 449 

Conditional expected value, 133, 445 

Conditional probability, 51, 427, 33.5 

Congruent by dissection, 179 

Conjugate index, 242 

Conjugate space, 244 

Consistency conditions for finite-dimensional 
distributions, 483 

Content, 3.15 

Continued-fraction transformation, 319, A36 

Continuity from above, 25 

Continuity of paths, 500 

Continuum hypothesis, 46 

Conventions involving o, 160 

Convergence in distribution, 329, 378 

Convergence in mean, 243 

Convergence in measure, 268 

Convergence in probability, 70, 268, 330 

Convergence with probability 1, 70, 330 

Convergence of random series, 289 

Convergence of types, 193 

Convex functions, A32 

Convolution, 266 


INDEX 


Coordinate function, 27, 484 
Coordinate variable, 484 
Countable, 8 

Countable additivity, 23, 161 
Countable subadditivity, 25, 162 
Countably generated o-field, 2.11 
Countably infinite, 8 

Counting measure, 161 
Coupled chain, 126 

Coupon problem, 362 
Covariance, 277 

Cover, A3 

Cramér-Wold theorem, 383 
Cylinder, 27, 485 


Daniell-Stone theorem, 11.14, 16.12 
Darboux- Young definition, 15.2 
A-distribution, 192 
Decision theory, 247 
Decomposition, A3 
de Finetti theorem, 473 
Definite integral, 200 
Degenerate distribution function, 193 
Delta method, 359 
DeMoive- Laplace theorem, 25.11, 358 
DeMorgan law, A6 
Dense, A15 
Density of measure, 213, 422 
Density point, 31.9 
Density of random variable or distribution, 
257, 260 

Density of set of integers, 2.18 
Denumerable probabilities, 51 
Dependent random variables, 363 
Derivatives of integrals, 402 
Diagonal method, 29, A14 
Difference equation, A19 
Difference set, Al 
Diophantine approximation, 13, 324 
Dirichlet theorem, 13, A26 
Discontinuity of the first kind, 534 
Discrete measure, 23, 161 
Discrete random variable, 256 
Discrete space, 1.1, 23, 5.16 
Disjoint, A3 
Disjoint supports, 410, 421 
Distribution: 

of random variable, 73, 187, 256 

of random vector, 259 
Distribution function, 175, 188, 256, 259, 409 
Dominated convergence theorem, 78, 209 
Dominated measure, 422 
Double exponential distribution, 348 
Double integral, 233 


INDEX 


Double series, A27 

Doubly stochastic matrix, 8.20 
Dual space, 245 
Dubins-Savage theorem, 102 
Dyadic expansion, 3, A31 
Dyadic interval, 4 

Dyadic transformation, 313 
Dynkin's 7—A theorem, 42 


e—ó definition of absolute continuity, 422 
Egorov theorem, 13.9 

Eigenvalues, 8.26 

Empirical distribution function, 268 
Empty set, A1 

Entropy, 57, 6.14, 8.31, 31.17 
Equicontinuous, 355 

Equivalence class, 58 

Equivalent measures, 422 

Erdos—Kac central limit theorem, 395 
Ergodic theorem, 314 

Erlang density, 23.2 

Essential supremum, 241 

Estimation, 251, 452 

Etemadi, 282, 288, 22.15 

Euclidean distance, A16 

Euclidean space, A1, A16 

Euler function, 2.18 

Event, 18 

Excessive function, 134 

Exchangeable, 473 

Existence of independent sequences, 73, 265 
Existence of Markov chains, 115 
Expected value, 76, 273 

Exponential convergence, 131, 8.18 
Exponential distribution, 189, 258, 297, 348 
Extension of measure, 36, 166, 11.1 
Extremal distribution, 195 


Factorization and sufficiency, 450 
Fair game, 92, 463 

Fatou lemma, 209 

Field, 19, 2.5 

Filtration, 458 

Finite additivity, 20, 23, 2.15, 3.8, 161 
Finite or countable, 8 
Finite-dimensional distributions, 308, 482 
Finite-dimensional sets, 485 

Finitely additive field, 20 

Finite subadditivity, 24, 162 

First Borel-Cantelli lemma, 59 

First category, 1.10, A15 

First passage, 118 

fixed discontinuity, 303 

Fourier representation, 250 


589 


Fourier series, 351, 26.30 

Fourier transform, 342 

Frequency, 8 

Fubini theorem, 233 

Functional central limit theorem, 522 

Fundamental in probability, 20.21 

Fundamental set, 320 

Fundamental theorem of calculus, 224, 400 

Fundamental theorem of Diophantine 
approximation, 324 


Gambling policy, 98 

Gamma distribution, 20.17 
Gamma function, 18.18 
Generated o-field, 21 
Glivenko—Cantelli theorem, 269 
Goncharov’s theorem, 361 


Hahn decomposition, 420 

Hamel basis, 14.7 
Hardy-Ramanujan theorem, 6.16 
Heine—Borel theorem, A13, A17 
Hewitt-Savage zero-one law, 496 
Hilbert space, 249 

Hitting time, 136 

Holder's inequality, 80, 5.9, 242, 276 
Hypothesis testing, 151 


Inadequacy of A’, 492 

Inclusion-exclusion formula, 24, 163 

Indefinite integral, 400 

Identically distributed, 85 

Independent classes, 55 

Independent events, 53 

Independent increments, 299, 498 

Independent random variables, 71, 261 

Independent random vectors, 263 

Indicator, A5 

Infinitely divisible distributions, 371 

Infinitely often, 53 

Infinite series, A25 

Information, 57 

Initial digit problem, 25.3 

Initial probabilities, 111 

Inner boundary, 64 

Inner measure, 37, 3.2 

Integrable, 200, 206 

Integral, 199 

Integral with respect to Lebesgue measure, 
221 

Integrals of derivatives, 412 

Integration by parts, 236 

Integration over sets, 212 

Integration with respect to a density, 214 


S90 INDEX 


Interior, A11 À-system, 41 
Interval, A9 Lusin theorem, 17.10 
Invariance principle, 520 Lyapounov condition, 362 
Invariant set, 313 Lyapounov inequality, 81, 277 
Inverse image, A7, 182 
Inversion formula, 346 Mapping theorem, 344, 380 
Irreducible chain, 119 Marginal distribution, 261 
Irregular paths, 504 Markov chain, 111, 363, 367, 29.11, 429 
Iterated integral, 233 Markov inequality, 80, 276 
Markov process, 435, 510 
Jacobian, 225, 261, 545 Markov shift, 312 
Jensen inequality, 80, 276, 449 Markov time, 133 
Jordan decomposition, 421 Martingale, 101, 458, 514 
* Jordan measurable, 3.15 Martingale central limit theorem, 475 
Martingale convergence theorem, 468 
k-dimensional Borel set, 158 Maximal ergodic theorem, 317 
k-dimensional Lebesgue measure, 171, 177, Maximal inequality, 287 
17.14, 20.4 Maximal solution, 122 
Kolmogorov existence theorem, 483 p.-continuity set, 335, 378 
Kolmogorov zero-one law, 63, 287 m-dependent, 6.11, 364 
Mean value, 26.17 
Landau notation, A18 Measurable mapping, 182 
Laplace distribution, 348 Measurable process, 503 
Laplace transform, 285 Measurable rectangle, 231 
Large deviations, 148 Measurable with respect to a o-field, 68, 225 
Lattice distribution, 26.1 Measurable set, 20, 38, 165 
Law of the iterated logarithm, 153 Measurable space, 161 
Law of large numbers: Measure, 22, 160 
strong, 9, 11, 85, 282 Measure-preserving transformation, 311 
weak, 5, 11, 86, 284 Measure space, 161 
Lebesgue decomposition, 414, 425 Meets, A3 
Lebesgue density theorem, 31.9, 35.15 Method of moments, 388, 30.6 
Lebesgue function, 31.3 Minimal sufficient field, 454 
Lebesgue integrable, 221, 225 Minimum-variance estimation, 454 
Lebesgue measure, 25, 43, 167, 171, 177 Minkowski inequality, 5.10, 242 
Lebesgue set, 45 Mixing, 24.3, 363, 29.10, 34.16 
Leibniz formula, 17.8 Mixture, 473 
Lévy distance, 14.5, 25.4, 26.16 A*-measurable, 165 
Likelihood ratio, 461, 471 Moment, 274 
Limit inferior, 52, 4.1 Moment generating function, 1.6, 146, 278, 
Limit of sets, 52 284, 390 
Lindeberg condition, 359 Monotone, 24, 162, 206 
Lindeberg-Lévy theorem, 357 Monotone class, 43, 3.12 
Linear Borel set, 158 Monotone class theorem, 43 


Linear functional, 244 Monotone convergence theorem, 208 
Linearity of expected value, 77 M-test, 210, A28 

Linearity of the integral, 206 Multidimensional central limit theorem, 385 
Linearly independent reals, 14.7, 30.8 Multidimensional characteristic function, 381 
Lipschitz condition, 418 Multidimensional distribution, 259 
Log-normal distribution, 388 Multidimensional normal distribution, 383 
Lower integral, 204, 228 Multinomial sampling, 29.8 

Lower semicontinuous, 29.1 

Lower variation, 421 Negative part, 200, 254 

L ?-space, 241 Negligible set, 8, 1.3, 1.9, 44 


INDEX 591 


Neyman-Pearson lemma, 19.7 Probability measure space, 23 
Nonatomic, 2.19 Probability transformation, 14.3 
Nondenumerable probabilities, 526 Product measure, 28, 12.12, 233, 487 
Nonmeasurable set, 45, 12.4 Product space, 27, 231, 484 
Nonnegative series, A25 Projection, 27, 484 
Norm, 243 Proper difference, Al 
Normal distribution, 258, 383 Proper subset, Al 
Normal number, 8, 1.8, 86, 6.13 T-system, 41 
Normal number theorem, 9, 6.9 
Nowhere dense, A15 Rademacher functions, 5, 289, 291 
Nowhere differentiable, 31.18, 505 Radon—Nykodym derivative, 423, 460, 470 
Null persistent, 130 Radon-Nykodym theorem, 422, 32.8 
Number theory, 393 Random Taylor series, 292 

Random variable, 67, 182, 254 
Open set, A11 Random vector, 183, 255 
Optional sampling theorem, 466 Random walk, 112 
Optimal stopping, 133 Rank, 4, 320 
Order of dyadic interval, 4 Ranks and records, 20.8 
Orthogonal projection, 250 Rate of Poisson process, 299 
Orthonormal, 249 Rational rectangle, 158 
Ottaviani inequality, 22.15 Realization of process, 493 
Outer boundary, 64 Record values, 20.9, 21.3, 22.9, 27.8 
Outer measure, 37, 3.2, 165 Rectangle, 158, A16 

Recurrent event, 8.17 
Pairwise disjoint, A3 Red-and-black, 92 
Partial-fraction expansion, 20.14 Reflection principle, 511 
Partial information, 57 Regularity, 174 
Partition, A3 Relative frequency, 8 
Path function, 308, 493, 500 Relative measure, 25.16 
Payoff function, 133 Renewal theory, 8.17, 310 
Perfect set, A15 Reversed martingale, 472 
Peano curve, 179 Riemann integral, 2, 12, 221, 228, 25.12 
Period, 125 Riemann-Lebesgue lemma, 345 
Permutation, 72, 86, 361 Riesz-Fischer theorem, 243 
Persistent, 117, 120 Riesz representation theorem, 17.12, 244 
Phase space, 111 Right continuity, 175, 256 
T-À theorem, 42 Rigid transformation, 172 
P* measurable, 38 Risk, 247, 251 
Poincaré theorem, 29.9 
Point of increase, 12.9, 20.12 Saltus, 188 
Poisson approximation, 302, 328 Sample function, 188 
Poisson distribution, 257, 299, 375 Sample path, 188, 308 
Poisson process, 297, 436 Sample point, 18 
Poisson theorem, 6.5 Sampling theory, 392 
Polar coordinates, 226, 261 Scheffé theorem, 215 
Polya criterion, 26.3 Schwarz inequality, 81, 5.6, 249, 276 
Polya theorem, 118 Second Borel- Cantelli lemma, 60, 4.11, 88 
Positive part, 200, 254 Second category, 1.10, A15 
Positive persistent, 130 Second-order Markov chain, 8.32 
Power class, 21, A1 Secretary problem, 114 
Power series, A29 Section, 231 
Prékopà theorem, 303 Selection problem, 113, 138 
Primitive, 224, 400 Selection system, 95, 7.3 


Probability measure, 22 Semicontinuous, 29.1 


592 


Semiring, 166 

Separable function, 526 

Separable process, 527, 531 

Separable o-field, 2.11 

Separant, 527 

Sequence space, 27, 311 

Set function, 22, 420 

o-field, 20 

o-field generated by a class of sets, 21 

c-field generated by a random variable, 68, 
255 

c-finite on a class, 160 

c-finite measure, 160 

Shannon theorem, 6.14, 8.31 

Signed measure, 32.12 

Simple function, 184 

Simple random variable, 67, 185, 254 

Singleton, A1 

Singular function, 407 

Singularity, 421 

Singular part, 425 

Skorohod embedding, 513, 519 

Skorohod theorem, 333 

Southwest, 176, 247 

Space, Al, 17 

Square-free integers, 4.15 

Stable law, 377 

Standard normal distribution, 258 

State space, 111 

Stationary distribution, 124 

Stationary increments, 499 

Stationary probabilities, 124 

Stationary process, 363, 494 

Stationary sequence of random variables, 363 

Stationary transition probabilities, 111 

Stieltjes integral, 228 

Stirling formula, 27.18 

Stochastic arithmetic, 2.18, 4.15, 4.16, 5.19, 
6.16, 18.17, 25.15, 393, 30.9, 30.10, 30.11 

Stochastic matrix, 112 

Stochastic process, 298, 308, 482 

Stopping time, 99, 133, 464, 465, 508 

Strong law of large numbers, 8, 9, 11, 85, 6.8, 
282, 312, 27.20 

Strong Markov property, 508 

Subadditivity: 

countable, 25, 162 
finite, 24, 162 

Subfair game, 92, 102 

Submartingale, 462 

Subset, Al 

Subsolution, 8.5 


INDEX 


Sufficient subfield, 450 
Superharmonic function, 134 
Supermartingale, 462 

Support line, A33 

Support of measure, 23, 161 
Symmetric difference, A1 
Symmetric random walk, 113, 35.10 
Symmetric stable law, 378 

System, 111 


Tail o-field, 63, 287, 496 
Tarski theorem, 3.8 

Taylor series, A29, 292 

Thin cylinders, 27 

Three series theorem, 290 
Tightness, 336, 380 

Timid play, 108 

Tonelli theorem, 234 

Total variation, 421 

Trajectory of process, 493 
Transformation of measures, 185, 215 
Transient, 117, 120 

Transition probabilities, 111 
Translation invariance, 45, 172 
Triangular array, 359 
Triangular distribution, 348 
Trifling set, 1.3, 1.9, 3.15 
Type, 193 


Unbiased estimate, 251, 454 

Uncorrelated random variables, 7, 277 

Uniform distribution, 258, 348 

Uniform distribution modulo 1, 328, 352 

Uniform integrability, 216, 338 

Uniformly equicontinuous, 26.15 

Uniqueness of extension, 36, 42, 163 

Uniqueness theorem for characteristic 
functions, 346, 382 

Uniqueness theorem for moment generating 
functions, 147, 284, 26.7, 390 

Unit interval, 51 

Unit mass, 24 

Upcrossing, 467 

Upper integral, 204, 228 

Upper semicontinuous, 29.1 

Upper variation, 421 

Utility function, 7.12 


Vague convergence, 371 
Value of payoff function, 134 
van der Waerden function, 31.18 


INDEX 


Variance, 78, 275 

Version of conditional expected value, 445 
Version of conditional probability, 430 
Vieta formula, 1.7 


Wald equation, 22.8 

Weak convergence, 190, 327, 378 

Weak law of large numbers, 5, 11, 86, 284 
Weierstrass approximation theorem, 87, 26.19 


593 


Weierstrass M-test, 210, A28 
Weiner process, 498 
With probability 1, 60 


Zeroes of Brownian motion, 507 

Zero-one law, 63, 117, 8.35, 286, 22.12, 314, 
496, 37.4 

Zorn’s lemma, A8 


